In Defense of RAG in the Era of Long-Context Language Models

Vlad Bogolin

AI Researcher and Engineer

发布日期: 2024年9月7日

Today's paper revisits the role of retrieval-augmented generation (RAG) in the era of long-context language models. It challenges the recent trend favoring long-context models over RAG, arguing that extremely long contexts can lead to diminished focus on relevant information. The paper introduces an order-preserve RAG mechanism that outperforms both traditional RAG and long-context models without RAG.

Method Overview

The paper introduces an order-preserve retrieval-augmented generation (OP-RAG) mechanism. This method builds upon traditional RAG approaches but with a key difference in how retrieved information is organized.

In OP-RAG, a long document is first split into multiple chunks. When a query is received, the system retrieves the most relevant chunks based on similarity scores. However, unlike traditional RAG which orders these chunks by relevance, OP-RAG maintains the original order of the chunks as they appeared in the source document.

This preservation of order is crucial. It helps maintain the logical flow and context of the information, which can be critical for understanding and generating accurate answers. By keeping the retrieved chunks in their original sequence, the language model can better grasp the relationships and continuity between different pieces of information.

The number of chunks retrieved is an important factor. As more chunks are retrieved, the answer quality initially improves due to increased access to relevant information. However, beyond a certain point, including too many chunks can introduce irrelevant information, leading to a decline in answer quality. This creates an inverted U-shaped performance curve, with an optimal "sweet spot" for the number of retrieved chunks.

Kishore Gopalan 2 个月前

Demystifying Large Language Models: A Beginner's Guide…

Harriet Fiagbor 1 年前

RAG: The Link for Accurate LLM Responses

Bita Houshmand 6 个月前

Results

The paper demonstrates that OP-RAG significantly outperforms both traditional RAG and long-context language models without RAG:

On the En.QA dataset, OP-RAG achieved a 47.25 F1 score using only 48K tokens, compared to 34.26 for a long-context model using 117K tokens.
OP-RAG showed superior performance across different context lengths, with larger models like Llama3.1-70B benefiting from longer contexts compared to smaller models.
The order-preserving mechanism proved particularly beneficial when retrieving larger numbers of chunks, significantly outperforming vanilla RAG in these scenarios.

Conclusion

This paper challenges the notion that long-context language models have made RAG obsolete. By introducing the order-preserve RAG mechanism, they demonstrate that a well-designed RAG system can outperform long-context models while using fewer tokens. For more information please consult the?full paper.

Congrats to the authors for their work!

Yu, Tan, et al. "In Defense of RAG in the Era of Long-Context Language Models." arXiv preprint arXiv:2409.01666 (2024).

In Defense of RAG in the Era of Long-Context Language Models

Vlad Bogolin

AI Researcher and Engineer

Method Overview

领英推荐

Results

Conclusion

AI Paper of the Day

815 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

RAG: The Link for Accurate LLM Responses

Graph RAG: Method for Enhanced Question Answering with Large Language Models

Understanding VLMs

Fine-Tuning or RAGging Out: The Future of Large Language Models

Long Context Transfer from Language to Vision

How to scale Large Language Models (LLMs) to infinite context?

Big Problems with "Super-Tiny" Language Models

A Summary of "Extending Context Window of Large Language Models via Positional Interpolation"

Top LLM Papers of the Week (March Week-3 2024)

Information Retrieval | Language models

Method Overview

领英推荐

Results

Conclusion

AI Paper of the Day

815 位关注者

NVLM: Open Frontier-Class Multimodal LLMs

2024年9月18日

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

2024年9月17日

InstantDrag: Improving Interactivity in Drag-based Image Editing

2024年9月16日

UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity

2024年9月15日

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

2024年9月14日

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

2024年9月13日

OpenAI o1 System Card

2024年9月12日

SongCreator: Lyrics-based Universal Song Generation

2024年9月11日

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

2024年9月10日

Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?

2024年9月9日

社区洞察

其他会员也浏览了

RAG: The Link for Accurate LLM Responses

Graph RAG: Method for Enhanced Question Answering with Large Language Models

Understanding VLMs

Fine-Tuning or RAGging Out: The Future of Large Language Models

Long Context Transfer from Language to Vision

How to scale Large Language Models (LLMs) to infinite context?

Big Problems with "Super-Tiny" Language Models

A Summary of "Extending Context Window of Large Language Models via Positional Interpolation"

Top LLM Papers of the Week (March Week-3 2024)

Information Retrieval | Language models