AI Efficiency Limits: Datasets, Computation Budget and Chinchilla Scaling Laws
Qualium Systems
VR/AR, Mobile, and Web applications for digital agencies and tech enterprises || ISO/IEC 27001, ISO 9001 certified
We have already reviewed the size (i.e., parameter quantity) limitations of current AI models. Now, I would like to examine other AI limitations: training datasets and computation budgets.
Computation Budget — this term refers to the resources needed to train a model, including time, memory space, electricity, and, of course, the quantity and power of CPUs, GPUs, and TPUs involved in the process. It's clear that computation budget is a critical factor: the more GPUs, and especially TPUs, we have, the faster and better the results. But how many resources are needed to train AI, especially if we aim to make progress in development?
Let’s dive a little deeper.?
Machine learning is conceptually similar to human learning: the model receives a sample of a task, tries to solve it, and learns from its mistakes.
The AI we all hope to develop must be able to solve an enormous range of tasks, which means it needs to "study" a lot — an unimaginable amount.
Unfortunately, current learning algorithms are far from efficient. Learning is still a challenging process. To solve a specific task, a model must receive a huge amount of “task => solution” samples. Even for just a few tasks, the amount of information the model needs is massive.
Initially, datasets for model training were created manually. Each dataset required hundreds of hours of human labor and contained millions of samples, but was designed to train the model for a single, specific task. Around 2018-2019, the industry shifted away from this approach, deeming it too time-consuming, expensive, and inflexible.
Instead, scientists began using any available raw data for model training without filtration or preparation like web pages or images downloaded from the Internet.
While the quality of such data cannot compare to manually prepared datasets...
...its quantity improved machine learning, as volume became crucial to making progress with inefficient learning algorithms. The more diverse, unfiltered data was used for training, the more intelligent the model became.
The fundamental research of model intelligence dependency from the size of learning dataset didn't make the world wait for them:
In 2020 OpenAI issued the article “Scaling Laws for Autoregressive Generative Modeling” where was stipulated that the increasing the quantity of parameters in 10 times required the increase of learning data set only in 2.5 times (in 10^0.4 to be precise)... and computation budget in 25 times.
In 2022 DeepMind issued the article “Training Compute-Optimal Large Language Models” where the previous dependencies were significantly reviewed: the learning data and size of the models should be scaled equally. It means that all previous models were learned with an extremely non-sufficient volume of learning data.?
领英推荐
This dependency was called as Chinchilla Scaling Laws:
“We test this hypothesis by training a predicted compute optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks.”
These two articles allow us to make the common conclusion that the parameters increasing in 10 times required the increase of learning data set also in 10 times… and computation budget in 100 times.
The Chinchilla Scaling Laws gave us at least two things:
There is another article that can help us to what trends are there:? Compute Trends Across Three Eras of Machine Learning
The most interesting part of this article reveals two trends:
But as big corporation doesn’t share their data, there is a room for alternative calculations, as an example:?
To train GPT-3 required 3.14e23 flops.
To train GPT-4 required 2.15e25 flops.
The model has increased 68 times for 33 months or doubled every 5-6 months :-)
Will computation budgets continue to grow aggressively?
Judging by the construction of new data centers by OpenAI and its competitors, I would say yes.
Are there limits and challenges to this growth? Certainly.
To be continued…