Project | Paderborn University

Overview

The speech signal is a rich source of information that conveys not only linguistic but also extra/para-linguistic information, such as the speaker's identity, gender, emotional state, age, or the social status. However, those traits are hidden in complex, non-transparent variations of the speech signal, and mostly obscure to speech research. With recent progress in speech synthesis and voice conversion caused by the advent of deep learning, we argue that synthesized speech can become a valuable tool for research in phonetics. The overarching goal of this project is thus to explore the potential of deep generative modeling of speech as a tool to support basic research in phonetics. To constrain the task, we will not consider the synthesis of stimuli from text, but concentrate on the dedicated manipulation of speech to generate new speech signals with desired properties. The goal is to develop generative models which offer a representation of the speech signal by latent variables, which is compact and informative about the observed speech signal, which represents different sources of variation of the speech signal by different dimensions of the representation, which allows a dedicated manipulation of a phonetic cue along phonetically plausible dimensions, and which is amenable to human interpretation.

Key Facts

Project duration:: 04/2021 - 12/2024

Funded by:: DFG

Websites:: DFG-Datenbank gepris
Tiefe generative Modelle für die Phonetikforschung

More Information

Principal Investigators

Prof. Dr. Reinhold Häb-Umbach

Communications Engineering / Heinz Nixdorf Institute

About the person

Petra Wagner

Universität Bielefeld

About the person (Orcid.org)

More information about the project:

DFG-Datenbank gepris

Tiefe generative Modelle für die Phonetikforschung