Tackling Medical Misinformation using Machine Translation: NLLB-200, Why should we care, Part 2
Tobi Olatunji MD
ASR/MT/AI for Global Health @ Intron | Ex-AWS | Ex-Enlitic | 3x patents
In an era of viral vitriol, superstition, and conspiracy theories about vaccines and epidemics, “freedom of speech” on digital platforms has amplified the influence of misinformation sometimes resulting in real-life fatal outcomes for vulnerable populations with little access to credible health information.
In Part 1, I discussed the looming decline of minority languages, highlighting how high-quality machine translation can reverse this trend. Part 2 focuses on healthcare applications, using models like Meta’s recently open-sourced NLLB-200, a single multilingual model that translates between 200 languages including 55 African languages (plus Yoruba, my mother tongue) as a powerful public health tool.
As highlighted in the NLLB paper, during the Covid-19 outbreak, in communities where science-backed information was sparse due to the lack of trust-worthy formal institutions, seniors in these communities were dependent on their more tech-savvy network and family members to acquire timely, translated health information derived from international organizations. Democratizing high-quality machine translation could shorten this gap, reducing the dependence on intermediaries and opening up quicker access to credible healthcare information.
While machine translation primarily helps those from more advantaged backgrounds learn new languages or travel more effectively, its presence in low-resource language communities could be instrumental for social mobility, economic survival and even longevity. Access to credible healthcare information represents another line of defence against egregious superstitions about the origins, etiology, and pathogenesis of well understood diseases. Vulnerable populations can independently verify diagnostic, therapeutic, and preventative claims that directly impact their quality of life and health outcomes.
The chances are pretty low that scientific breakthroughs, clinical trials, drug discoveries, and systematic reviews will be published in minority languages in any timely manner if at all. This creates a massive information vacuum, where nearly half the world is separated from information that could impact longevity. Thankfully, cable networks like BBC create and publish global and local content in multiple African languages. High-quality, open-source machine translation could scale this potential to the information on the web. Powerful!!
领英推荐
Why does NLLB Work: The Gory Technical Details
Meta’s NLLB project is one of the best examples I’ve seen of data-centric and value-based design applied to machine learning. The paper shows the team was deliberate about the quality of translations AS PERCEIVED BY native speakers, not just another massive PR stunt. Lasting over a year, the team painstakingly optimized for data quality, spending over 50% effort gathering the right data (37 Petabytes!) to solve the problem. Here are a few highlights showing WHY their approach worked:
Let me know in the comments if this was helpful or if I missed out any important detail
The End!