In Defense of RAG in the Era of Long-Context Language Models
Credit: https://arxiv.org/pdf/2409.01666

In Defense of RAG in the Era of Long-Context Language Models

Today's paper revisits the role of retrieval-augmented generation (RAG) in the era of long-context language models. It challenges the recent trend favoring long-context models over RAG, arguing that extremely long contexts can lead to diminished focus on relevant information. The paper introduces an order-preserve RAG mechanism that outperforms both traditional RAG and long-context models without RAG.

Method Overview

The paper introduces an order-preserve retrieval-augmented generation (OP-RAG) mechanism. This method builds upon traditional RAG approaches but with a key difference in how retrieved information is organized.

In OP-RAG, a long document is first split into multiple chunks. When a query is received, the system retrieves the most relevant chunks based on similarity scores. However, unlike traditional RAG which orders these chunks by relevance, OP-RAG maintains the original order of the chunks as they appeared in the source document.

This preservation of order is crucial. It helps maintain the logical flow and context of the information, which can be critical for understanding and generating accurate answers. By keeping the retrieved chunks in their original sequence, the language model can better grasp the relationships and continuity between different pieces of information.

The number of chunks retrieved is an important factor. As more chunks are retrieved, the answer quality initially improves due to increased access to relevant information. However, beyond a certain point, including too many chunks can introduce irrelevant information, leading to a decline in answer quality. This creates an inverted U-shaped performance curve, with an optimal "sweet spot" for the number of retrieved chunks.

Results

The paper demonstrates that OP-RAG significantly outperforms both traditional RAG and long-context language models without RAG:

  1. On the En.QA dataset, OP-RAG achieved a 47.25 F1 score using only 48K tokens, compared to 34.26 for a long-context model using 117K tokens.
  2. OP-RAG showed superior performance across different context lengths, with larger models like Llama3.1-70B benefiting from longer contexts compared to smaller models.
  3. The order-preserving mechanism proved particularly beneficial when retrieving larger numbers of chunks, significantly outperforming vanilla RAG in these scenarios.

Conclusion

This paper challenges the notion that long-context language models have made RAG obsolete. By introducing the order-preserve RAG mechanism, they demonstrate that a well-designed RAG system can outperform long-context models while using fewer tokens. For more information please consult the?full paper.

Congrats to the authors for their work!

Yu, Tan, et al. "In Defense of RAG in the Era of Long-Context Language Models." arXiv preprint arXiv:2409.01666 (2024).

要查看或添加评论,请登录

社区洞察

其他会员也浏览了