New Benchmark Evaluates LLMs On 100 Languages
Margaretta Colangelo
Leading AI Analyst | Speaker | Writer | AI Newsletter 57,000+ subscribers
One major limitation of current LLMs is their capacity to understand and respond accurately to cultural and linguistic diversity. Although current LLMs perform well with widely spoken languages, they struggle with many other languages. To help improve next generation LLMs, scientists have developed All Languages Matter Benchmark (ALM-bench). ALM-bench tests the ability of LLMs to understand culturally diverse images paired with text. This is the largest and most comprehensive effort to date for evaluating LLMs across 100 languages. This project was a collaboration between scientists at MBZUAI Mohamed bin Zayed University of AI, University of Central Florida, Aalto University, Australian National University, Link?ping University, and Amazon. The preprint is available on arXiv.
1) Data Annotation and Curation
2) 16 LLMs
The authors evaluated the performance of the following 16 LLMs:
3) 100 Languages
100 languages, their associated country, language scripts, families, subgrouping, and the resource specification.
4) Types of Questions
5) Performance
领英推荐
6) Successes and Failures
Examples of successes and failures of GPT-4o. The success cases are on the first row and failure cases are on the second row.
References
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Author Affiliations: University of Central Florida, Mohamed bin Zayed University of AI, Amazon,Aalto University, Australian National University, Link?ping University
Authors: Ashmal Vayani, Dinura Dissanayake , Hasindri Watawana , Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Mihaylov, Chao Qin, Abdelrahman Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani, Sebastian Cavada, Jenny Chim, Rohit Gupta, Sanjay Manjunath, Kamila Zhumakhanova, Feno Heriniaina Rabevohitra, Azril Amirudin, Muhammad Ridzuan, Daniya Kareem, Ketan More, Kunyang Li, Pramesh Shakya, Muhammad Saad, Amirpouya Ghasemaghaei, Amirbek Djanibekov, Dilshod Azizov, Branislava Jankovic, Naman Bhatia, Alvaro Cabrera, Johan Obando-Ceron, Olympiah Otieno, Fabian Farestam, Muztoba Rabbani, Sanoojan Baliah, Santosh Sanjeev, Abduragim Shtanchaev, Maheen Fatima, Thao Nguyen, Amrin Kareem, Toluwani Aremu, Nathan Xavier, Amit Bhatkal, Hawau Toyin, Aman Chadha3, Hisham Cholakkal2 , Rao Muhammad Anwer, Michael Felsberg, Jorma Laaksonen, Thamar Solorio , Monojit Choudhury, Ivan Laptev, Mubarak Shah, Salman Khan, Fahad Shahbaz Khan.
Subscribe, Comment, Join Group
I'm interested in your feedback - please leave your comments.
To subscribe to the AI in Healthcare Milestones newsletter click here.
To join the AI in Healthcare Milestones Group click here.
Copyright ? 2024 Margaretta Colangelo. All Rights Reserved.
This article was written by Margaretta Colangelo. Margaretta is a leading AI analyst who tracks significant milestones in AI in healthcare. She consults with AI healthcare companies and writes about some of the companies she consults with. Margaretta serves on the advisory board of the AI Precision Health Institute at the University of Hawai?i?Cancer Center @realmargaretta