Many-Shot In-Context Learning
Today's paper explores many-shot in-context learning, where large language models (LLMs) are provided with hundreds or thousands of examples at inference in order to learn new tasks. The authors leverage the recently expanded context windows of LLMs like Gemini 1.5 Pro to investigate performance gains from few-shot to many-shot learning across a wide range of tasks.
Overview
This paper tests many-shot in-context learning on several tasks. Since in some cases it can be hard to have multiple examples to use in the context, the authors propose reinforced ICL and unsupervised ICL. Reinforced ICL replaces human-written rationales with model-generated ones which are then filtered via answer correctness. Unsupervised ICL is also proposed, where only problems instead of problem-solution pairs are used in the prompt. Next, let’s see what are the used tasks for evaluation:
- Machine translation: Using up to 997 translation pairs as in-context examples, performance improves by 4.5% on English to Kurdish and 1.5% on English to Tamil compared to 1-shot prompts.
- Summarization: Using up to 500 (news article, summary) pairs, performance approaches that of models fine-tuned on the XSum and XLSum datasets.
- Planning in logistics domain: Success rate improves substantially with up to 800 in-context examples of planning problems and solutions.
- Learning code verifiers: Using up to 512 (problem, code solution) pairs labeled for correctness, the model becomes better at verifying code solutions.
- Problem-solving on MATH and GSM8K: Both Reinforced ICL using model-generated rationales and Unsupervised ICL using only problems outperform using human-written solutions.
- Question-answering on GPQA: Reinforced ICL matches performance of state-of-the-art few-shot models.
领英推荐
- Algorithmic reasoning on BIG-Bench Hard: Reinforced ICL outperforms human-written chain-of-thought prompts on 8 challenging tasks.
Keypoints
1) Many-shot learning leads to significant performance gains over few-shot learning across machine translation, summarization, planning, code verification, problem-solving, question-answering and algorithmic reasoning tasks.
2) With sufficient examples, many-shot learning can overcome pre-training biases and adapt to non-natural language tasks that are difficult for few-shot learning.
3) Performance is still sensitive to example ordering even with many shots.
Conclusion
This work thoroughly tests many-shot in-context learning on multiple tasks. Using many-shot in-context learning, large language models can be more versatile and adaptable without task-specific fine-tuning. For more information please consult the full paper.
Congrats to the authors for their work!
Agarwal, Rishabh, et al. "Many-Shot In-Context Learning." arXiv preprint arXiv:2404.11018 (2024).