LLM: A Dying Language Saviour?
Image generated by DALL-E

LLM: A Dying Language Saviour?

In the excitement around LLMs there’s been a focused and fascinating discussion about Low Resource Languages (LRL).?

The interest in LRLs is primarily driven by the monetization opportunity for LLMs to expand chatbot and image generation capabilities into large markets where there’s lots of people who speak the language, but not much of it is written on the Internet.

Dying Languages vs Low Resource Languages

There’s a difference between an LRL and a dying or endangered language.? They share the characteristic of “not much is written on the Internet,” which is what’s needed to create the models, but the real striking difference is that in an endangered language is the number of people who are fluent can be counted on one hand.

For example, Thai is considered an LRL while having about 20 million native speakers.? Compare that to Munsee a critically endangered language which has only 2 elderly speakers as of 2018.

So from a business perspective the opportunity is clear.? If an enterprise LLM can extend it’s model to the widely spoken but not written (on the internet compared to English) languages like Vietnamese, Swahili, Hindi, Thai, Urdu and Bengali they can increase the scope of opportunities for monetization.

But where does this leave us with endangered or dying languages?

AI for Good – Protecting Language

This is an opportunity for the leading LLM vendors to invest for the good of humankind.? We have the technology and ability to save for posterity these endangered indigenous languages, keeping a part of our human history alive for future generations.

There is real investment required.? We need to augment, for example, BLOOM from HuggingFace on a set of texts that includes stories and songs and other cultural artifacts to create the model that can generate new content in the language.

By leveraging transformer models we can revitalize endangered and dying languages.? For example, by training these models on recordings of speakers it’s possible to create text-to-speech and speech-to-text applications that can not only create chatbots, but also help learners to practice their language skills.

Taken together we can not only save and protect endangered and dying languages, we in the AI industry can also support indigenous communities by creating applications to improve cross-lingual communication.

#LRL #NLP #ML #AI #IndigenousCanada #IndigenousPeoples #IndigenousKnowledge? #IndigenousLanguages? #NativeAmerican

Chris McLellan

Enabling Collaborative Intelligence in the enterprise at Cinchy. Advocating responsible Artificial Intelligence at Ask AI, a nonprofit raising awareness since 2017.

5 个月

Great post Paul O'Hagan

回复
Steven Thomas

Owner at NDN Gutter LLC Experience Quality & Integrity

9 个月

Thank you for this article! I am a student at the College of the Muscogee Nation in Okmulgee, OK, USA. My course of study is Native American Studies: Mvskoke Language Teaching. The more I study, the more I realize LLMs may be extremely helpful in my field. As it's been a few months since you wrote this, I wanted to know if you had identified any models that would be more helpful in this than others. I am on the school Esports team and have more access to the IT and engineering faculty than most students. If you have any advice, I would greatly appreciate it.

回复

要查看或添加评论,请登录

Paul O'Hagan的更多文章

  • Tech Debt & Strategic Roadmaps

    Tech Debt & Strategic Roadmaps

    "Shipping first time code is like going into debt” – Ward Cunningham, 1992 Every product has technical debt. There is…

    2 条评论
  • Synthetic Data: an AI Gold Rush

    Synthetic Data: an AI Gold Rush

    A Data Winter is Coming The meteoric rise of generative AI to the broad public consciousness has had a predictable but…

    4 条评论
  • Magical Science of Product Pricing

    Magical Science of Product Pricing

    In this article about “things that make a highly successful Product Manager” we’re going to talk Pricing. Which must be…

    1 条评论
  • Creating the Perfect Roadmap

    Creating the Perfect Roadmap

    Creating and maintaining a roadmap is one of the most important tasks a Product Manager has. And it’s also one we often…

    1 条评论
  • The Future of LLMs is Micro

    The Future of LLMs is Micro

    Since ChatGPT burst onto the scene, renewing interest on chatbots popularized in the late 1960s, it's been a wild ride…

    1 条评论
  • We have a Problem with Racist AI

    We have a Problem with Racist AI

    Racial, gender and other biases in AI are a pervasive problem. Google is just one timely example of this at work.

    1 条评论
  • AI and air-to-air combat

    AI and air-to-air combat

    There's a fantastic article written by Commander Colin 'Farva' Price about the recent AlphaDogfight trials where an AI…

    1 条评论
  • Google - Tracking Every Step You Take

    Google - Tracking Every Step You Take

    Can anyone recommend a good Faraday cage? Last year part of a family vacation included a trip to Istanbul. This was my…

    1 条评论
  • Data Quality & AI

    Data Quality & AI

    In the book Microsoft Access for Dummies 2003. Chapter 5 is "Avoiding 'garbage in, garbage out.

  • ECM / Content Services Leaders - Ripe for Disruptive Innovation

    ECM / Content Services Leaders - Ripe for Disruptive Innovation

    I created a meta-chart of all the vendors who have been “Leaders” in the Gartner ECM (now Content Services) MQ since…

    5 条评论

社区洞察

其他会员也浏览了