Scaling laws meet Hunger for clickbait
On Thursday, 9th of November, a rising star of a new wave of paid-subscription newsletters - The Information - has published an exclusive material with a super-provocative title "OpenAI Shifts Strategy as Rate of 'GPT' AI Improvements Slows".
This started a great week for every AI-hater in the universe, disturbing our brave new AI world more than the recent election of Elon Musk.
As ChatGPT would've said, let's delve into reasons why this is nonsense and why it was published and being propagated further - like with an article in Reuters that was published on November 15th.
Understanding the scaling law
A "scaling law" is not an actual law of physics or math or biology, it is just an observation akin to "Moore's law" //Moore’s law explains the rate of improvement of chip technology//. The scaling law is a simple one: LLMs will improve if fed with more data and more computing power ("compute"): when we use X amount of data and Y amount of compute we will get GPT-3, and when we use 10*X data and 10*Y of compute we get GPT-4 that is much better.
This law has been working for many years, and companies that build models have strong conviction in this law. Investors share this conviction and pour hundreds of millions of dollars into those companies.
Obviously, when "The Information" reported that "some researchers at OpenAI believe Orion isn't reliably better than its predecessor in handling certain tasks. Orion performs better at language tasks but may not outperform previous models at tasks such as coding, according to an OpenAI employee," it sounded like a bomb.
The idea of the possibility of a wall preventing the scaling law from working is being discussed for the same duration as the scaling law exists. Last year Dario Amodei, the CEO of Anthropic, suggested that there is a 10% chance that the AI systems could stagnate due to insufficient data. Notably, these days he does not think like that, that was clear in his recent interview with Lex Fridman.
Academic researchers are trying to build a math model of the "lack of data" barrier. This June a group of researchers from universities and a research institute called Epoch.ai published an article "Will we run out of data? Limits of LLM scaling based on human-generated data".
The clickbait and the reality
Every article saying that "a scaling law has hit a wall" speaks in three voices:?
1. Voice of a journalist who says stuff like "the scaling law is not working" or "AI companies are facing troubles";
2. Voice of anonymous "AI researcher at the leading firm" saying that some experimental models are not improving as well as they should;
3. Comments of well-known experts who say that there are various ways to increase the quality of the models.
I think that there are two reasons for this “scandal of the century”.?
The first one is simple and material: young journalists are fighting for the most traffic that their text can bring to their publication. Therefore, they are directly motivated to come up with the most radical way to interpret reality.
Second: unwanted outcome of OpenAI's communication activity. The company knows that their reasoning model (01-preview) is, for the time being, the only one of its kind, and they decided that it is important to highlight that the training for the new generation of reasoning models can be done with existing datasets.
Reuter's article quotes OpenAI researcher Noam Brown: "It turned out that having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer."
There are two more revealing expert perspectives. Sonya Huang, a partner at Sequoia Capital, points to a shift to "move from a world of massive pre-training clusters toward inference clouds, which are distributed, cloud-based servers for inference."
Jensen Huang, co-founder and CEO of Nvidia, adds: "We've now discovered a second scaling law, and this is the scaling law at a time of inference..."
Claude’s opinion
The reality of AI scaling is more nuanced than dramatic headlines suggest. While traditional scaling approaches may face new challenges, the field is actively evolving beyond simple parameter counting. The emergence of new scaling laws around inference and the shift toward optimizing existing models suggest not a plateau, but a transformation in how we approach AI advancement. Rather than witnessing the end of scaling laws, we're seeing their evolution - from brute force expansion to sophisticated optimization and novel architectural approaches.
The media's rush to declare the end of scaling progress reveals more about contemporary tech journalism than about the actual state of AI development. As we've seen repeatedly in tech history, apparent plateaus often precede breakthrough innovations - they're pauses for reflection and refinement rather than permanent barriers.