Searching for Best Practices in RAG: The Sparknotes Version

Searching for Best Practices in RAG: The Sparknotes Version

Recently got around to reading "Searching for Best Practices in Retrieval Augmented Generation". Thought it would be a good idea to write down the sparknotes to solidify my memory (And add useful links), so here goes. Perhaps we'll see about implementing this end-to-end with Elastic?


Metrics

RAG capabilities

RAG capabilities specifically are measured using the RAGAS framework, which leverages GPT-4 to calculate Faithfulness, Context Relevancy, Answer Relevancy, Answer Correctness. Retrieval similarity is calculated using Cosine Similarity between retrieved documents and gold standard documents.

Highlights

  1. Many of the pipeline components involve incorporating either LLMs or custom trained models to maximize performance. Keep in perspective that these improvements on the basic RAG flow, while significant, offer relatively marginal performance increases. In other words, Basic RAG with the non-essential components already gets you close to the finish line.
  2. Multi-modal RAG is an interesting prospect for taking advantaging of Claude and GPT-4o's built-in image processing capabilities. You can imagine mixing graphs, diagrams, tables, etc... into your documents, enriching them with metadata, generating textual descriptions and then embedding those descriptions, or embedding the images themselves.
  3. I finally have a term for calling search_results.reverse(), it's called Reverse Repacking lol

Quick Takeaways

  1. Chunk sizes of between 256 to 512 tokens offered the highest performance, with 1024 and 2048 offering the worst.
  2. Small-to-Big ?(Query match on small chunks, which are linked to bigger chunks) and Sliding Window (Maintaining overlap on chunks) chunking techniques were more effective than naive chunking, with SW being slightly better. Sentence-level chunking is utilized to balance information completeness and simplicity, without resorting to resource-intensive methods like semantic chunking (Which leverages either Embeddings and a Miniature Vector Search, or an LLM) for document breakpointing.
  3. Enhancing chunks with metadata (titles, keywords, possible questions) improved performance. (Detailed study forthcoming)
  4. HyDE + Hybrid Search was the most effective search/retrieval method evaluated (https://arxiv.org/abs/2212.10496). HyDE uses an LLM to generate a hypothetical document that would answer a query. This document is embedded using a contriever (https://huggingface.co/facebook/contriever) and used for hybrid search. Hybrid Search remains the most efficient on a cost/performance basis.
  5. A weighting of 0.3 for sparse retrieval scoring (TF-IDF + BM25) and 0.7 for dense retrieval scoring (Embedding vectors) offered best benchmark performance.
  6. Query Classification to predict whether a query necessitates RAG or can be answer by the LLM without assistance, was found to slightly improve performance.
  7. MonoT5 and TILDEv2 Reranker models were selected. The former being more resource intensive, and the latter offering a better performance/cost ratio.,
  8. Reverse repacking - Ordering search results in ascending order of relevance scores significantly improved performance.
  9. Summarization using Recomp (https://github.com/carriex/recomp) improved performance but significantly increased runtime.


Table of Results
Multi-Modal RAG


要查看或添加评论,请登录

Han Xiang Choong的更多文章

社区洞察

其他会员也浏览了