Scientific framework and objectives

Automatic Language Identification (ALI) as a discipline appeared about thirty years ago; but intensive research in the area only dates back to the beginning of the nineties. Most of the systems that have been developed to this day have quite logically borrowed methods from automatic speech or speaker recognition. Although these systems are rather efficient and already allow us to come up with solutions in some fields (particularly in multilingual human-computer interfaces), they fail to address most of the linguistic and cognitive problems linked to the notion of inter-language distance (automatic typology, differences between languages and dialects, modelling of the cognitive processing of linguistic distance, etc.)


Our current research explores approaches based on the modelling of suprasegmental cues (rhythm and intonation) with a view to integrating them into modular automatic identification systems. The efficiency of these models depends on the extraction of relevant parameters (in accordance with existing rhythm typologies, or revealed by perceptual experiments), building adequate models, and validating the models using read and spontaneous speech corpora.

