The decline in minority languages and MetaAI's NLLB model for multi-lingual translation
Tobi Olatunji MD
NLP/ASR/AI for Global Health @ Intron & BioRAMP Lab | Ex-AWS | Ex-Enlitic | 3x patents
No-Language-Left-Behind (NLLB), MetaAI’s mega-project to automatically translate between 200 languages with a single model: Why should we care?? Part 1
I’ve recently been quite concerned that many languages, including my mother tongue, Yoruba, are losing global relevance, and in a few years, our children will have zero knowledge of the depth and richness of expression in our native languages.?
As discussed in the super long NLLB paper, 63.7% of online content is in English. The cultural dominance of the West applies intense but inadvertent pressure onto more localized media production. As low-resource language speakers gravitate towards books, movies, and social media content tailored to high-resource language audiences, interest in content produced in their native tongue is crowded out. Without sustained audiences, cultural products in low-resource languages risk displacement.
Additionally, For many who come from developing nations, a high-resource language like English is seen as both a vehicle for global competitiveness and upward socioeconomic mobility. Prioritizing the lingua franca of the global economy means directing more resources at English education and tethering local communities to the needs of the knowledge economy. The preponderance of knowledge and opportunities communicated in English might spell an increasing peripheralization of native languages in public life. Under such pressures, the status of many low-resource languages risks continued relegation.
领英推荐
High-quality machine translation is a step in the right direction that could begin to help stem this tide. Automated translations for sites like Wikipedia into my native language, Yoruba, fo example, suddenly unlocks a massive amount of knowledge for native speakers, empowering them and enhancing + preserving the value of the language to its native speakers.
Interesting modeling techniques behind this breakthrough model include bitext mining using LASER3 for multi-lingual sentence representation, Curriculum learning training on high-resource languages first, and a sparsely gated mixture-of-experts model.
More on this and healthcare applications in Part 2
Lyman T Johnson Fellow || GA- UK Step-Up || Ph.D. Student - Curriculum and Instruction || German Language Instructor & Exams Coach || The Sent Dimension Projects
2 年This is a good direction in addressing this decline. Ife Fenimi Adebara also talked about the absence of a standard Yoruba electronical dictionary which she was working on last year. I hope with all of these individual and collective measures and the acknowledgement that speaking one's native/local language does not imply lower intelligence, we should be able to arrest this decline. Great job Tobi.