Google DeepMind investigated inference scaling for long-context RAG

Google DeepMind investigated inference scaling for long-context RAG

Google DeepMind explored how to scale inference in RAG effectively:

- They introduced new DRAG and IterDRAG strategies

- Discovered “inference scaling laws” for RAG

- Developed a model to predict optimal RAG settings based on computing power

Here are the details:

  • Demonstration-Based RAG (DRAG)

In DRAG, the input expands by adding examples and relevant documents to the prompt. It retrieves top-ranked documents (e.g., from Wikipedia) and organizes them by importance. This setup provides rich context for generating answers in one step.

  • Iterative Demonstration-Based RAG

IterDRAG is used for questions that need multiple steps to answer. It breaks down complex queries into manageable parts. The model is prompted to generate the steps itself, adding documents and answers as it works through each sub-query.

Image credit: Original paper

Scaling advantage:

When original RAG improves up to 128k tokens and then levels off, DRAG keeps improving up to 1M tokens, and IterDRAG up to 5M tokens.

DRAG performs better with shorter budgets (16k and 32k), while IterDRAG is more effective at larger scales (128k and beyond).

Image credit: Original paper


  • Inference scaling laws for RAG:

- Linear growth: As the computation increases, RAG performance improves almost in a straight line.

- For budgets over 100,000 tokens, IterDRAG has steady improvement using resources effectively beyond 128k tokens.

- Diminishing returns beyond 1M: Performance gains slows down between 1M and 5M tokens.

Image credit: Original paper


  • The computation allocation model for RAG:

It boosts performance by choosing the best settings (documents, examples, iterations) for the available context length. The model excels with contexts under 1M tokens and generalizes well, but accuracy drops at 5M tokens.

Image credit: Original paper

Original paper: https://arxiv.org/pdf/2410.04343

要查看或添加评论,请登录

社区洞察

其他会员也浏览了