Research Project

Machine Learning Techniques for Modeling of Language Varieties

Project type: Research Project
Programme: Information and Communication Technology
Call: IKT Call 2010
Duration: 3,00 years
Grant awarded: 529.000 €
Keywords: language technology, machine learning, language variety, machine translation

Harald Trost

Austrian Research Institute for Artificial Intelligence (ÖFAI)

Project partners: Sylvia Moosmüller, Austrian Academy of Sciences
Philipp Koehn, University of Edinburgh

Language varieties are gaining importance in man-machine interaction. Using them in speech based communication enables computer systems to reflect the socio-cultural identity of users. Current language technology cannot deliver on this, yet. There are a few synthetic voices with localized pronunciation, but language varieties are multi-faceted, involving deviations on various levels.
We will develop algorithms capable of capturing and reproducing all major idiosyncracies displayed by a language variety, be they syntactic, lexical or phonological. The task can be viewed as machine translation with some unique properties: the difficulty posed by the scarcity of available data is counterbalanced by the relative proximity between the varieties and the standard language. Our approach will therefore rely on optimal selection of data and smart use of linguistic knowledge. Standard German and Viennese varieties serve as a test bed for the realization and exploration of our techniques.

« back