登录查看更多内容

AI Efficiency Limits: Datasets, Computation Budget and Chinchilla Scaling Laws

Qualium Systems

VR/AR, Mobile, and Web applications for digital agencies and tech enterprises || ISO/IEC 27001, ISO 9001 certified

发布日期: 2024年10月1日

We have already reviewed the size (i.e., parameter quantity) limitations of current AI models. Now, I would like to examine other AI limitations: training datasets and computation budgets.

Computation Budget — this term refers to the resources needed to train a model, including time, memory space, electricity, and, of course, the quantity and power of CPUs, GPUs, and TPUs involved in the process. It's clear that computation budget is a critical factor: the more GPUs, and especially TPUs, we have, the faster and better the results. But how many resources are needed to train AI, especially if we aim to make progress in development?

Let’s dive a little deeper.?

Machine learning is conceptually similar to human learning: the model receives a sample of a task, tries to solve it, and learns from its mistakes.

The AI we all hope to develop must be able to solve an enormous range of tasks, which means it needs to "study" a lot — an unimaginable amount.

Unfortunately, current learning algorithms are far from efficient. Learning is still a challenging process. To solve a specific task, a model must receive a huge amount of “task => solution” samples. Even for just a few tasks, the amount of information the model needs is massive.

"Computation Budget" by Stable Diffusion

Initially, datasets for model training were created manually. Each dataset required hundreds of hours of human labor and contained millions of samples, but was designed to train the model for a single, specific task. Around 2018-2019, the industry shifted away from this approach, deeming it too time-consuming, expensive, and inflexible.

Instead, scientists began using any available raw data for model training without filtration or preparation like web pages or images downloaded from the Internet.

While the quality of such data cannot compare to manually prepared datasets...

With sample of its data. "Blue cat". The accuracy problems can be found at any row :-)

...its quantity improved machine learning, as volume became crucial to making progress with inefficient learning algorithms. The more diverse, unfiltered data was used for training, the more intelligent the model became.

The fundamental research of model intelligence dependency from the size of learning dataset didn't make the world wait for them:

In 2020 OpenAI issued the article “Scaling Laws for Autoregressive Generative Modeling” where was stipulated that the increasing the quantity of parameters in 10 times required the increase of learning data set only in 2.5 times (in 10^0.4 to be precise)... and computation budget in 25 times.

In 2022 DeepMind issued the article “Training Compute-Optimal Large Language Models” where the previous dependencies were significantly reviewed: the learning data and size of the models should be scaled equally. It means that all previous models were learned with an extremely non-sufficient volume of learning data.?

领英推荐

All You Need to Know About AI and ML: Differences…

DrighnaTech 10 个月前

Are LLMs ready for an Upgrade?

Michael Spencer 2 个月前

Understanding Artificial Intelligence: What Is It and…

Sentinel Trust Company, LBA 1 年前

This dependency was called as Chinchilla Scaling Laws:

“We test this hypothesis by training a predicted compute optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.”

These two articles allow us to make the common conclusion that the parameters increasing in 10 times required the increase of learning data set also in 10 times… and computation budget in 100 times.

The Chinchilla Scaling Laws gave us at least two things:

Another interesting explanation why the trend of parameters increasing slowed down.
The integrity indicator that could help us to understand the size and intelligence of the modern models — the computation budget that was used for their training.

There is another article that can help us to what trends are there:? Compute Trends Across Three Eras of Machine Learning

The most interesting part of this article reveals two trends:

Graphic reason for building new nuclear power plants

Deep Learning Era models (mostly academic) where the computation budget doubles every 5-6 months.
Large Scale Era models, trained by big corporations, started with much larger computation budgets and double them every 9-10 months.

But as big corporation doesn’t share their data, there is a room for alternative calculations, as an example:?

To train GPT-3 required 3.14e23 flops.

To train GPT-4 required 2.15e25 flops.

The model has increased 68 times for 33 months or doubled every 5-6 months :-)

Will computation budgets continue to grow aggressively?

It's for 2023, in 2024 it is going to be bigger

Judging by the construction of new data centers by OpenAI and its competitors, I would say yes.

Are there limits and challenges to this growth? Certainly.

To be continued…

AI Efficiency Limits: Datasets, Computation Budget and Chinchilla Scaling Laws

Qualium Systems

VR/AR, Mobile, and Web applications for digital agencies and tech enterprises || ISO/IEC 27001, ISO 9001 certified

领英推荐

Qualium Systems Digest

156 位关注者

Qualium Systems的更多文章

社区洞察

其他会员也浏览了

DeepSeek: A What is it and What Does It Mean for the Future of the Industry

Overview of AI technology patent categories and technological needs

DeepSeek: Disrupting the AI Landscape with Innovation and Open Access

Unrevealing AI

AI: The Ultimate If/Then Computing Revolution

Disentangling words from images in CLIP and SOTA video self-supervised learning | Your Daily AI Research tl;dr - 2022-06-19 ??

The rise of Real AI Industry: Causal Interactive Learning vs. Deep Statistical Learning

AI - Hype, Hope or Hell?

OpenAI introduces GPT-o1 with human-like reasoning and advanced capabilities

AI vs. Not AI: Machine Intelligence-cum-Human Intelligence

领英推荐

Qualium Systems Digest

156 位关注者

Qualium Systems的更多文章

AI Efficiency Size Limits: Neural Scaling Laws

Exposure Therapy: How Extended Reality (XR) Helps Cure Phobias

How to Determine if Your App is Compatible with visionOS

Agile Alchemy: Transforming Teams into Success Stories

Con Artists and AI or How to Avoid Being a Fraud Victim

10 Innovative Applications of Apple Vision Pro Apps for Businesses and Users

Scrum Methodology: Tackling Agile Challenges Inspired by the Rugby Field

How to Navigate the Pitfalls of Project Estimation: A Guide to Avoiding the 'Lowball Estimate' Trap

Generative Animation and the Future of Interactivity

A Glimpse into the Past: Retrospective Forecasts on AI and Computer Hardware

社区洞察

其他会员也浏览了

DeepSeek: A What is it and What Does It Mean for the Future of the Industry

Overview of AI technology patent categories and technological needs

DeepSeek: Disrupting the AI Landscape with Innovation and Open Access

Unrevealing AI

AI: The Ultimate If/Then Computing Revolution

Disentangling words from images in CLIP and SOTA video self-supervised learning | Your Daily AI Research tl;dr - 2022-06-19 ??

The rise of Real AI Industry: Causal Interactive Learning vs. Deep Statistical Learning

AI - Hype, Hope or Hell?

OpenAI introduces GPT-o1 with human-like reasoning and advanced capabilities

AI vs. Not AI: Machine Intelligence-cum-Human Intelligence