In Defense of RAG in the Era of Long-Context Language Models
Today's paper revisits the role of retrieval-augmented generation (RAG) in the era of long-context language models. It challenges the recent trend favoring long-context models over RAG, arguing that extremely long contexts can lead to diminished focus on relevant information. The paper introduces an order-preserve RAG mechanism that outperforms both traditional RAG and long-context models without RAG.
Method Overview
The paper introduces an order-preserve retrieval-augmented generation (OP-RAG) mechanism. This method builds upon traditional RAG approaches but with a key difference in how retrieved information is organized.
In OP-RAG, a long document is first split into multiple chunks. When a query is received, the system retrieves the most relevant chunks based on similarity scores. However, unlike traditional RAG which orders these chunks by relevance, OP-RAG maintains the original order of the chunks as they appeared in the source document.
This preservation of order is crucial. It helps maintain the logical flow and context of the information, which can be critical for understanding and generating accurate answers. By keeping the retrieved chunks in their original sequence, the language model can better grasp the relationships and continuity between different pieces of information.
The number of chunks retrieved is an important factor. As more chunks are retrieved, the answer quality initially improves due to increased access to relevant information. However, beyond a certain point, including too many chunks can introduce irrelevant information, leading to a decline in answer quality. This creates an inverted U-shaped performance curve, with an optimal "sweet spot" for the number of retrieved chunks.
领英推荐
Results
The paper demonstrates that OP-RAG significantly outperforms both traditional RAG and long-context language models without RAG:
Conclusion
This paper challenges the notion that long-context language models have made RAG obsolete. By introducing the order-preserve RAG mechanism, they demonstrate that a well-designed RAG system can outperform long-context models while using fewer tokens. For more information please consult the?full paper.
Congrats to the authors for their work!
Yu, Tan, et al. "In Defense of RAG in the Era of Long-Context Language Models." arXiv preprint arXiv:2409.01666 (2024).