DTT/HELAN2 - Bogdan Ludusan: "A computational approach to prosodic characterization of under-resourced languages"
14h00-15h30
ISH-Ennat Léger
There are more than 7000 languages attested in the world today, but only a small fraction of them have been properly documented and only a selected few have been described also in terms of their prosodic characteristics. Furthermore, as our current state of knowledge is limited by the few languages which have been properly characterized, new insights into the nature of this diversity might lead to a reassessment of current linguistic theories. I propose a project that aims to discover the prosodic characteristics of under-resourced languages directly from the speech signal, using a bottom-up approach. Several prosodic characteristics will be investigated, including rhythm, prosodic boundaries and prominence. The proposed methods will build upon recent developments in unsupervised speech processing, combining state-of-the-art speech representations with the latest advancements in unsupervised machine learning and artificial intelligence. The methods will first be validated on well−studied languages and will be tested on a massive parallel corpus of under−resourced languages, containing data from over 700 languages.