LLM: A Dying Language Saviour?
In the excitement around LLMs there’s been a focused and fascinating discussion about Low Resource Languages (LRL).?
The interest in LRLs is primarily driven by the monetization opportunity for LLMs to expand chatbot and image generation capabilities into large markets where there’s lots of people who speak the language, but not much of it is written on the Internet.
Dying Languages vs Low Resource Languages
There’s a difference between an LRL and a dying or endangered language.? They share the characteristic of “not much is written on the Internet,” which is what’s needed to create the models, but the real striking difference is that in an endangered language is the number of people who are fluent can be counted on one hand.
For example, Thai is considered an LRL while having about 20 million native speakers.? Compare that to Munsee a critically endangered language which has only 2 elderly speakers as of 2018.
So from a business perspective the opportunity is clear.? If an enterprise LLM can extend it’s model to the widely spoken but not written (on the internet compared to English) languages like Vietnamese, Swahili, Hindi, Thai, Urdu and Bengali they can increase the scope of opportunities for monetization.
But where does this leave us with endangered or dying languages?
领英推荐
AI for Good – Protecting Language
This is an opportunity for the leading LLM vendors to invest for the good of humankind.? We have the technology and ability to save for posterity these endangered indigenous languages, keeping a part of our human history alive for future generations.
There is real investment required.? We need to augment, for example, BLOOM from HuggingFace on a set of texts that includes stories and songs and other cultural artifacts to create the model that can generate new content in the language.
By leveraging transformer models we can revitalize endangered and dying languages.? For example, by training these models on recordings of speakers it’s possible to create text-to-speech and speech-to-text applications that can not only create chatbots, but also help learners to practice their language skills.
Taken together we can not only save and protect endangered and dying languages, we in the AI industry can also support indigenous communities by creating applications to improve cross-lingual communication.
#LRL #NLP #ML #AI #IndigenousCanada #IndigenousPeoples #IndigenousKnowledge? #IndigenousLanguages? #NativeAmerican
Enabling Collaborative Intelligence in the enterprise at Cinchy. Advocating responsible Artificial Intelligence at Ask AI, a nonprofit raising awareness since 2017.
5 个月Great post Paul O'Hagan
Owner at NDN Gutter LLC Experience Quality & Integrity
9 个月Thank you for this article! I am a student at the College of the Muscogee Nation in Okmulgee, OK, USA. My course of study is Native American Studies: Mvskoke Language Teaching. The more I study, the more I realize LLMs may be extremely helpful in my field. As it's been a few months since you wrote this, I wanted to know if you had identified any models that would be more helpful in this than others. I am on the school Esports team and have more access to the IT and engineering faculty than most students. If you have any advice, I would greatly appreciate it.