Emergent Abilities in LLMs
Vishal Jindal
Lead Data Scientist at UHG | Project orchestrator and production implementation | Delivering Data-driven Solutions for Business Growth| Pyspark, Azure Databricks, Machine Learning, Big Data, Python, SQL, MLOPS |
What Are Emergent Abilities
Emergent abilities in LLMs are defined as significant improvements in task performance that become apparent as the model size or scale increases. These abilities, which are not present or noticeable in smaller or less complex models, become evident in larger or more complex models. This suggests that the model is learning and generalizing from its pre-training in ways that were not explicitly programmed or expected.
When visualized on a scaling curve, emergent abilities show a pattern where performance is almost random until a certain scale threshold, after which performance increases significantly. This is known as a phase transition, a dramatic change in behavior that could not have been predicted by examining smaller-scale systems.
In the following image, taken from the paper “Emergent Abilities of Large Language Models,” we see several charts showing the emergence of abilities of LLMs (whose performance is shown on the y-axis) with respect to the model scale (shown on the x-axis).
Language models have been scaled primarily along computation amount, model parameters, and training dataset size. The emergence of abilities may occur with less training computation or fewer model parameters for models trained on higher-quality data. It also depends on factors such as the amount of data, its quality, and the number of parameters in the model.
Emergent abilities in LLMs appear as the models scale up and cannot be predicted by simply extrapolating from smaller models.
Evaluation Benchmarks for Emergent Abilities
Several benchmarks are used to evaluate the emergent abilities of language models. These include the BIG-Bench suite, TruthfulQA, the Massive Multi-task Language Understanding (MMLU) benchmark, and the Word in Context (WiC) benchmark.
领英推荐
Other Factors That Could Give Rise To Emergent Abilities
Risks With Emergent Abilities
As we scale up language models, we also need to be aware of the emergent risks that come with it. These risks could be societal issues related to truthfulness, bias, and toxicity. These risks can be avoided by applying strategies, such as giving model prompts that encourage them to be "helpful, harmless, and honest.”
The WinoGender benchmark, which measures gender bias in occupations, has shown that scaling can improve performance but also increase bias in ambiguous contexts. Larger models were found to be more likely to memorize training data, although deduplication methods can reduce this risk.
Emergent risks also include phenomena that might only exist in future language models or that have not yet been characterized in current models. These could include backdoor vulnerabilities or harmful content synthesis.
A Shift Towards General-Purpose Models
The emergence of abilities has led to sociological changes in how the community views and uses these models. Historically, NLP focused on task-specific models. Scaling models has led to an explosion in research on "general purpose" models that aim to perform a range of tasks not explicitly encoded in the training data.
This shift towards general-purpose models is evident when scaling enables a few-shot prompted general-purpose model to outperform prior state-of-the-art held by fine-tuned task-specific models. For example, GPT-3 achieved a new state-of-the-art on the TriviaQA and PiQA question-answering benchmarks; PaLM achieved a new state-of-the-art on three arithmetic reasoning benchmarks; and the multimodal Flamingo model achieved a new state of the art on six visual question answering benchmarks.
The ability of general-purpose models to perform unseen tasks, given only a few examples, has also led to many new applications of language models outside the NLP research community. For instance, language models have been used by prompting to translate natural language instructions into actions that are executable by robots, interact with users, and facilitate multi-modal reasoning.
Credit: Activeloop.ai
Lead Data Scientist at UHG | Project orchestrator and production implementation | Delivering Data-driven Solutions for Business Growth| Pyspark, Azure Databricks, Machine Learning, Big Data, Python, SQL, MLOPS |
4 个月LLM Day 2 https://www.dhirubhai.net/feed/update/urn:li:activity:7201516416708091904/?updateEntityUrn=urn%3Ali%3Afs_updateV2%3A%28urn%3Ali%3Aactivity%3A7201516416708091904%2CFEED_DETAIL%2CEMPTY%2CDEFAULT%2Cfalse%29