登录查看更多内容

Chat GPT-4o, Six Other AI Models Fail China’s College Entrance Maths Exam

Yicai 第一财经

China, inside out. We are the English-language version of Shanghai-based business and financial media outlet Yicai.

发布日期: 2024年6月20日

(Yicai) June 20 -- US artificial intelligence firm OpenAI’s closed source ChatGPT-4o and six other large language models were asked to write China’s notoriously difficult college entrance examinations in three subjects, English, Chinese and mathematics. Although they performed relatively well in the language options, none of them passed in maths.

Chat GPT-4o as well as open source models developed by e-commerce giant Alibaba Group Holding, 01.AI, Zhipu AI, Shanghai Artificial Intelligence Laboratory and France’s Mistral AI, were put to the test by OpenCompass, the Shanghai AI Lab’s evaluation system.

China’s tough college entrance exams are a good way of gauging LLM’s intelligence, the Shanghai AI Lab said. The tests were all marked manually and the teachers who marked the exams were not informed that the tests were taken by a machine. The exams contained both objective and subjective questions, it added.

Alibaba’s Qwen 2-72B was the smartest LLM, scoring 303 points out of a total of 420 points in the three subjects, according to the results released by OpenCompass yesterday. This was followed by San Francisco-based OpenAI’s Chat GPT-4o with 296 points and the Shanghai AI Lab's InternLM 2.0 with 295.5 points. Mistral AI’s LLM came last with 185 points.

But all of them failed the maths exam. InternLM 2.0 achieved the highest score of just 75 points out of 150. GPT-4o came second with 73 points.

Danny Butvinik 1 年前

? Time for LLMs?

Pascal Biese 10 个月前

New Open Long-Context LLM; LLMs For Text Analysis;…

Danny Butvinik 1 年前

In the maths paper, examiners found that the AI models’ answers to subjective questions were illogical and confused. Sometimes the reasoning was wrong but the answer was correct. The LLMs are able to memorize formulas well but they have trouble in explaining how they solve problems.

This shows that there is still a lot of room for improvement in terms of AI models' maths abilities, Lin Dahua, a scientist at the Shanghai AI Lab, told Yicai. Maths involves complex reasoning, which is a key skill needed for the use of LLMs in finance and other important sectors.

The AI models performed well in terms of modern Chinese but there was a big gap in their knowledge about classical Chinese.

Qwen was the highest scorer in Chinese with 124 out of 150 points, and GPT-4o excelled in English with 109 out of 120 points.

In English, most humans who take the test lose points for not writing enough, but the AI models tended to have points deducted for exceeding the word limit.

Chat GPT-4o, Six Other AI Models Fail China’s College Entrance Maths Exam

Yicai 第一财经

China, inside out. We are the English-language version of Shanghai-based business and financial media outlet Yicai.

领英推荐

AI: Looming on the Horizon

617 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

LLM Paper Reading Notes - July 2024

??Top ML Papers of the Week

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Unveiling LLMops: Your Gateway to Efficient Large Language Model Operations

Exploring the Capabilities & Limitations of GPT-4: OpenAI's Large Language Model (Popular LLM Series)

Top LLM Papers of the week (February 2024 Week 4)

MLOps at Industrial-Scale: Lessons from Google

How LLM Models Can Help You Identify Open Gaps in Your Target Market

FOD#50: The Rise of Self-Evolving Language Models

领英推荐

AI: Looming on the Horizon

617 位关注者

A Weekly Business Briefing from Yicai Nov 29th

2024年11月29日

A Weekly Business Briefing from Yicai Nov 22nd

2024年11月22日

A Weekly Business Briefing from Yicai Nov 15th

2024年11月15日

A Weekly Business Briefing from Yicai Nov 8th

2024年11月8日

A Weekly Business Briefing from Yicai Nov 1st

2024年11月1日

A Weekly Business Briefing from Yicai Oct 25th

2024年10月25日

IMF, Chinese State Banks Discuss Sustainable Development at Global ESG Leaders Conference

2024年10月18日

Tech, Supply Chain Innovation Is Crucial for ESG Development, Top Execs Say at Global Summit

2024年10月18日

A Weekly Business Briefing from Yicai Oct 18th

2024年10月18日

China Highlights Shanghai's Lead in Sustainable Development in First Local ESG Report

2024年10月17日

社区洞察

其他会员也浏览了

LLM Paper Reading Notes - July 2024

??Top ML Papers of the Week

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Unveiling LLMops: Your Gateway to Efficient Large Language Model Operations

Exploring the Capabilities & Limitations of GPT-4: OpenAI's Large Language Model (Popular LLM Series)

Top LLM Papers of the week (February 2024 Week 4)

MLOps at Industrial-Scale: Lessons from Google

How LLM Models Can Help You Identify Open Gaps in Your Target Market

FOD#50: The Rise of Self-Evolving Language Models