登录查看更多内容

Will the AI "Mythical Man-Month" Bubble Burst? Is the Scaling Law Failing?

Yexi Jiang

Learner & Problem Solver | Visit yexijiang.substack.com/

发布日期: 2024年7月27日

The only certainty is that nothing is certain. -- Pliny the Elder, Roman naturalist

In 2019, I found myself torn between offers from Google Brain and several autonomous driving companies. Google's condition was that I have to join the Google Translate team. On one hand, there was the seemingly "over-matured" Google Translate; on the other, the hot and promising field of autonomous driving. Believing that autonomous driving represented the cutting edge of AI, I chose the latter. However, fate had a different plan. The Google Translate team unexpectedly thrived in the field of large language models, while autonomous driving companies faced significant setbacks and difficulties in real-world implementation. This experience taught me that technological development is fraught with uncertainty, often marked by cyclical rises and falls and unforeseen twists.

This uncertainty is particularly evident in today's AI landscape.

Same Chart, Different Perspective

Recently, a trend chart comparing the capabilities of closed-source and open-source models has sparked heated discussions in the AI community. People are amazed at the rise of open-source models. Interestingly, not long ago, both OpenAI CEO Sam Altman and former Chief Scientist Ilya Sutskever claimed that closed-source models would always outpace open-source ones.

The rise of open-source models is undoubtedly positive. However, I see this not as a sudden acceleration of open-source models but rather as a sign of stagnation in the capabilities of closed-source models. Does this indicate that the Scaling Law is failing?

It's too early to draw conclusions. We can wait for the performance of GPT-5 and see if the next generation of open-source models can surpass closed-source ones.

To maintain the bubble, GPT-5 needs to achieve 99% MMLU

I've annotated this trend chart to highlight my point. If the AI scaling law holds true, then GPT-5 (or any other model) would need to reach this specific position, essentially achieving over 99% accuracy of MMLU.

Other Observations

As scores get higher, further improvements become increasingly difficult. Therefore, we should look at the reduction in error rates rather than just the increase in accuracy. For example, improving from 90 to 95 reduces errors by 50%. This progress is as significant as going from 50 to 75.

Below, I've conducted a simple analysis of the evaluation results for various generations of the LLaMA model to examine whether the efficiency of scaling is diminishing. The data sources are Meta's published papers on each generation of LLaMA:

LLaMA 1: LLaMA: Open and Efficient Foundation Language Models
LLaMA 2: LLaMA 2: Open Foundation and Fine-Tuned Chat Models
LLaMA 3: The LLaMA 3 Herd of Models | Research - AI at Meta
LLaMA 3.1: Introducing LLaMA 3.1: Our most capable models to date

Here, we use language understanding evaluation (MMLU 5-shot) as an example, with similar evaluations in other areas. In the chart below, the x-axis represents model size, and the y-axis represents MMLU scores, with different colors representing different generations of LLaMA models.

We can see that the efficiency of scaling with each generation is diminishing, as the curve flattens out. However, one positive aspect is that the improvement from LLaMA 2 to LLaMA 3 is greater than that from LLaMA 1 to LLaMA 2. This makes me hopeful for the LLaMA 4, which is already in training. From an execution perspective, Meta/Facebook has never disappointed me.

We can quantify this trend using scaling efficiency, defined as the reduction in error rate divided by the increase in model size. We can see that scaling efficiency has been decreasing, from a high of 0.21 to 0.04.

Efficiency Ratio indicates the effectiveness of performance boosting by increasing the model size

Exam Performance vs. Genius-Level Mastery

In the AI field, achieving a perfect score on a specific benchmark does not mean a model has achieved perfection in that area. Much like in academics, a genius student can score 100 because that's the maximum possible, whereas an excellent student might score 99 due to limitations in their ability. That one-point difference, however, is vast.

Even if we only look at the accuracy of test sets, we haven't seen a model that excels across all metrics. For instance, early models achieved near-perfect performance on the MNIST dataset, and some models have reached over 92% accuracy on the ImageNet dataset.

领英推荐

Why OpenAI, Google and Anthropic Are Struggling to…

Bloomberg News 4 个月前

?? OpenAI Unveils GPT-4.5 'Orion' – its Largest AI…

The AI Journal 1 个月前

As AI agents like Auto-GPT speed up generative AI…

VentureBeat 1 年前

The MNIST dataset, it has hand written samples with 10 different numbers

The ImageNet dataset, it has millions of samples of objects in 1000+ classes

However, in current AI research, we haven't seen a model with "genius-level" attributes. All progress still relies on vast amounts of data, computational power, and meticulous tuning. This is similar to how excellent students must put in continuous practice and effort to get close to perfect scores, rather than achieving it effortlessly.

The Mythical Man-Month and Scaling Law

In software engineering, there's a famous concept known as the "Mythical Man-Month," which suggests that simply adding more human resources cannot linearly speed up project progress. Similarly, in AI, we face a comparable issue with the limitations of the Scaling Law.

The Scaling Law emphasizes enhancing model performance by increasing computational power and data volume. However, this approach is akin to the "manpower stacking" in the Mythical Man-Month, which may be a necessary condition but not a sufficient one. Merely relying on computational power and data doesn't guarantee breakthrough advancements.

For example, putting in 10,000 hours of practice can make a genius shine, but for an average person, even with the same amount of time and effort, they might surpass 95% of people but won't necessarily reach genius levels. Similarly, in AI research, while increasing computational power and data is crucial, have we overlooked the necessity of innovation and algorithmic optimization?

In understanding the Scaling Law, have we mistaken this necessary condition for a sufficient one?

Trends and Cycles in Technological Maturity

Technological progress and the formation of bubbles often go hand in hand. Bubbles aren't entirely bad; they reflect high expectations for emerging technologies. It's precisely these bubbles that bring substantial resource influx into a field, accelerating technological development.

As the Daoist saying goes, "Prosperity leads to decline, and extreme adversity leads to prosperity." This phrase vividly describes the cyclical nature of technological development.

Many of you might have seen Gartner's annual technology maturity curve. It shows that emerging technologies typically go through a bubble rise phase, followed by a bubble burst, and eventually enter a phase of steady maturity.

As early as 2017, Gartner marked deep learning and machine learning at the peak of the bubble. At that time, various image understanding models were proliferating, with RCNN, YOLO, SSD, and RetinaNet showing impressive visual understanding results. China's four major AI companies—Megvii, SenseTime, YITU, and CloudWalk—were also very popular by then. However, besides security monitoring and autonomous driving, these technologies hadn't found large-scale commercial applications. Deep learning in practical applications hadn't yet brought significant economic benefits (measured in billions).

As Nassim Taleb mentioned in "The Black Swan," black swan events are unpredictable and have profound impacts. This applies to both individuals and collectives. No one anticipated that Google's 2018 paper "Attention is All You Need," proposing the Transformer model, would create such a sensation. Following that, BERT was introduced in 2019. These two papers sparked a revolution in natural language processing and quickly spread to other fields.

Neither I nor Gartner could have foreseen this transformation at the time.

By 2020, generative AI appeared on Gartner's curve. ChatGPT hadn't been released yet, and GPT-2 only garnered some attention within the community in 2019. I remember casually looking at this model capable of generating short stories, finding it merely interesting.

However, by 2024, generative AI has become the hottest topic. From Gartner's perspective, generative AI seems to be at a point similar to deep learning in 2017, possibly nearing the burst of its bubble.

Daniel Svonava

Vector Compute @ Superlinked | xYouTube

8 个月

This article is a great reminder of how unpredictable tech can be.?

1 次回应

查看更多评论

要查看或添加评论，请登录

Yexi Jiang的更多文章

Breaking the Cognitive Barrier with AI: The Transformation from the Internet Era to Industry-Wide Disruption

2025年2月15日

Breaking the Cognitive Barrier with AI: The Transformation from the Internet Era to Industry-Wide Disruption

With groundbreaking advancements in code generation by companies like OpenAI and DeepSeek, a fundamental question…
From Code to Metaphysics: Is AI's Ultimate Question a Philosophical One?

2025年2月6日

From Code to Metaphysics: Is AI's Ultimate Question a Philosophical One?

Perhaps you've read some of my previous articles and noticed that I frequently circle back to philosophy, as if…
You Are Not Facing AGI, But the Constellation of Human Civilization

2025年2月3日

You Are Not Facing AGI, But the Constellation of Human Civilization

When the attention mechanism scans through trillions of tokens, it touches not mere streams of data, but the geometric…
DeepSeek: Devils Hide in Details, but Angels Show the Way

2025年1月25日

DeepSeek: Devils Hide in Details, but Angels Show the Way

The DeepSeek Phenomenon Recently, DeepSeek released a groundbreaking AI model that quickly captured the attention of…

3 条评论
Perfect Imperfection: The Journey of Digital Beings Towards Humanity

2025年1月6日

Perfect Imperfection: The Journey of Digital Beings Towards Humanity

Note: I'll be starting a new role in 2025, working on the domain I described in this article. While this new journey…
The Digital Wild: Are We Tamed by the Apps, or Are They Tamed by Us?

2025年1月4日

The Digital Wild: Are We Tamed by the Apps, or Are They Tamed by Us?

Who is Taming Whom in This Digital Jungle? Every time I unlock my phone, I'm reminded of Baudrillard's theory of…
Attention Economy: Your Attention Has Already Been Priced

2024年12月26日

Attention Economy: Your Attention Has Already Been Priced

If you're not paying for the product, you are the product. This quote unveils a harsh reality of the internet era.

1 条评论
When AI Learnt to get the Treat with the Trick

2024年12月19日

When AI Learnt to get the Treat with the Trick

A recent hot topic in the AI community emerged from a thought-provoking blog post by Lilian Weng, a former core member…
Cursor Turns Left, ChatGPT Canvas Turns Right: Two Paths of AI Products

2024年10月5日

Cursor Turns Left, ChatGPT Canvas Turns Right: Two Paths of AI Products

"The advance of technology is based on making it fit in so that you don't really even notice it, so it's part of…
Breaking Free from "Involution": Insights from YouTube on AI Product Development

2024年8月18日

Breaking Free from "Involution": Insights from YouTube on AI Product Development

In today's highly competitive market, how can we develop an AI product that meets real-world application needs without…

See all articles

Will the AI "Mythical Man-Month" Bubble Burst? Is the Scaling Law Failing?

Yexi Jiang

Learner & Problem Solver | Visit yexijiang.substack.com/

Same Chart, Different Perspective

Other Observations

Exam Performance vs. Genius-Level Mastery

领英推荐

The Mythical Man-Month and Scaling Law

Trends and Cycles in Technological Maturity

Yexi Jiang的更多文章

社区洞察

其他会员也浏览了

AI NEWS YOU MISSED ?#50

Scaling Isn’t Dead: How Reasoning Models and Synthetic Data Are Redefining AI Progress

??? Why OpenAI’s New Model o1 is Redefining AI Safety Standards

Is AI Slowing Down?

OpenAI’s DevDay 2024: 4 major updates that will make AI more accessible and affordable

Generative AI made Another Big Leap Forward for Business This Week

Exciting News from OpenAI: Introducing GPTs and the GPT Store

?? AI in the News: Amazon's Shopping Bots, AI and Loneliness, New Models on the Block

Jarvis, you up?

Mistral AI Challenges Tech Giants With Compact 24B-Parameter Model

Same Chart, Different Perspective

Other Observations

Exam Performance vs. Genius-Level Mastery

领英推荐

The Mythical Man-Month and Scaling Law

Trends and Cycles in Technological Maturity

Yexi Jiang的更多文章

Breaking the Cognitive Barrier with AI: The Transformation from the Internet Era to Industry-Wide Disruption

From Code to Metaphysics: Is AI's Ultimate Question a Philosophical One?

You Are Not Facing AGI, But the Constellation of Human Civilization

DeepSeek: Devils Hide in Details, but Angels Show the Way

Perfect Imperfection: The Journey of Digital Beings Towards Humanity

The Digital Wild: Are We Tamed by the Apps, or Are They Tamed by Us?

Attention Economy: Your Attention Has Already Been Priced

When AI Learnt to get the Treat with the Trick

Cursor Turns Left, ChatGPT Canvas Turns Right: Two Paths of AI Products

Breaking Free from "Involution": Insights from YouTube on AI Product Development

社区洞察

其他会员也浏览了

AI NEWS YOU MISSED ?#50

Scaling Isn’t Dead: How Reasoning Models and Synthetic Data Are Redefining AI Progress

??? Why OpenAI’s New Model o1 is Redefining AI Safety Standards

Is AI Slowing Down?

OpenAI’s DevDay 2024: 4 major updates that will make AI more accessible and affordable

Generative AI made Another Big Leap Forward for Business This Week

Exciting News from OpenAI: Introducing GPTs and the GPT Store

?? AI in the News: Amazon's Shopping Bots, AI and Loneliness, New Models on the Block

Jarvis, you up?

Mistral AI Challenges Tech Giants With Compact 24B-Parameter Model