Technically Enabled Explaining of Speaker Traits


The speech signal is a rich source of informa­tion that conveys linguistic but also what is termed para- or extralinguistic content, revea­ling a speaker’s identity, gender, emotional or cognitive state, age, and health. These traits have been the subject of many investigations in phonetics, but due to the high complexity of the underlying dimensions, are often confined to highly controlled datasets that do not gene­ralize. Practical knowledge about the phonetics of speaker characteristics is also indispen­sible for voice practitioners such as speech therapists, actors or public speakers. Whereas speech technology is able to classify and even disentangle the complex signals underlying speech characteristics, the discipline hitherto does not provide interpretable models that aid phonetic experts in a knowledge transfer to non-expert voice practitioners. Our project will therefore examine the possibility of develo­ping technical solutions as a tool to support the generation of explanations within speech science. We argue specifically that the phonetic realization of a dimension of phonetic variation can be pinpointed much better if two speech probes are generated that contain the same lin­guistic content and differ only in the manifesta­tion of a single trait. These explanations should ultimately enable voice practitioners to either identify or mimic the paralinguistic dimensions of interest.

Key Facts

Project duration:
01/2021 - 12/2025
Funded by:

More Information

Principal Investigators

contact-box image

Prof. Dr. Reinhold Häb-Umbach

Communications Engineering / Heinz Nixdorf Institute

About the person