登录查看更多内容

Searching for Best Practices in RAG: The Sparknotes Version

Han Xiang Choong

Senior Customer Architect - APJ @ Elastic | Applied AI/ML | Search Experiences | Delivering Real-World Impact

发布日期: 2024年7月26日

Recently got around to reading "Searching for Best Practices in Retrieval Augmented Generation". Thought it would be a good idea to write down the sparknotes to solidify my memory (And add useful links), so here goes. Perhaps we'll see about implementing this end-to-end with Elastic?

Metrics

RAG capabilities

RAG capabilities specifically are measured using the RAGAS framework, which leverages GPT-4 to calculate Faithfulness, Context Relevancy, Answer Relevancy, Answer Correctness. Retrieval similarity is calculated using Cosine Similarity between retrieved documents and gold standard documents.

领英推荐

How to Use Synthetic and Simulated Data Effectively

Towards Data Science 11 个月前

Product Matching: A Comparative Analysis of Various…

Abiola A. David, MSc, MVP 1 年前

?? Apple's Answer to Complex LLM Evaluation

Pascal Biese 7 个月前

Highlights

Many of the pipeline components involve incorporating either LLMs or custom trained models to maximize performance. Keep in perspective that these improvements on the basic RAG flow, while significant, offer relatively marginal performance increases. In other words, Basic RAG with the non-essential components already gets you close to the finish line.
Multi-modal RAG is an interesting prospect for taking advantaging of Claude and GPT-4o's built-in image processing capabilities. You can imagine mixing graphs, diagrams, tables, etc... into your documents, enriching them with metadata, generating textual descriptions and then embedding those descriptions, or embedding the images themselves.
I finally have a term for calling search_results.reverse(), it's called Reverse Repacking lol

Quick Takeaways

Chunk sizes of between 256 to 512 tokens offered the highest performance, with 1024 and 2048 offering the worst.
Small-to-Big ?(Query match on small chunks, which are linked to bigger chunks) and Sliding Window (Maintaining overlap on chunks) chunking techniques were more effective than naive chunking, with SW being slightly better. Sentence-level chunking is utilized to balance information completeness and simplicity, without resorting to resource-intensive methods like semantic chunking (Which leverages either Embeddings and a Miniature Vector Search, or an LLM) for document breakpointing.
Enhancing chunks with metadata (titles, keywords, possible questions) improved performance. (Detailed study forthcoming)
HyDE + Hybrid Search was the most effective search/retrieval method evaluated (https://arxiv.org/abs/2212.10496). HyDE uses an LLM to generate a hypothetical document that would answer a query. This document is embedded using a contriever (https://huggingface.co/facebook/contriever) and used for hybrid search. Hybrid Search remains the most efficient on a cost/performance basis.
A weighting of 0.3 for sparse retrieval scoring (TF-IDF + BM25) and 0.7 for dense retrieval scoring (Embedding vectors) offered best benchmark performance.
Query Classification to predict whether a query necessitates RAG or can be answer by the LLM without assistance, was found to slightly improve performance.
MonoT5 and TILDEv2 Reranker models were selected. The former being more resource intensive, and the latter offering a better performance/cost ratio.,
Reverse repacking - Ordering search results in ascending order of relevance scores significantly improved performance.
Summarization using Recomp (https://github.com/carriex/recomp) improved performance but significantly increased runtime.

要查看或添加评论，请登录

Han Xiang Choong的更多文章

Improving e-Commerce Search with Query Profiles in Elastic

2024年11月21日

Improving e-Commerce Search with Query Profiles in Elastic

Introduction Elasticsearch is naturally suited for e-Commerce data, by which I mean large quantities of product…

2 条评论
Code Snippet: Parallel LLM Calls

2024年9月19日

Code Snippet: Parallel LLM Calls

Problem Want to use LLM to process very large data (>10^7 documents), want results asap. Minimize time per document.
Snippet: Speeding up Bulk Upload Speeds to Elastic with Parallelisation in Python

2024年9月13日

Snippet: Speeding up Bulk Upload Speeds to Elastic with Parallelisation in Python

Scenario: Uploading 35,000 large text documents of the format below, roughly 1-1500 words each, to an Elastic Cloud…
Automating Traditional Search

2024年9月9日

Automating Traditional Search

tldr; This short article is about uploading structured data to an Elastic index, then converting a plain English query…
A Personal Chatbot Interface with Elasticsearch & Streamlit

2024年8月31日

A Personal Chatbot Interface with Elasticsearch & Streamlit

This project lives in this github repo along with set-up instructions. Features Select an LLM and a custom system…

4 条评论
Search Concepts Cheatsheet - Elastic Oriented

2024年8月13日

Search Concepts Cheatsheet - Elastic Oriented

Decided to write an overview of key search concepts, just to refresh and crystallize my understanding. This is far from…
Search in Elastic 8.15 - Building RAG Extremely Quickly WITHOUT Code

2024年8月12日

Search in Elastic 8.15 - Building RAG Extremely Quickly WITHOUT Code

Hello friends and colleagues! Elastic 8.15 is out, and Semantic Search is easier than ever to pull off.

2 条评论
Advanced RAG Techniques Part 2: Querying and Testing

2024年8月6日

Advanced RAG Techniques Part 2: Querying and Testing

Welcome to Part 2 of our article on Advanced RAG Techniques! In part 1 of this series, we set-up, discussed, and…

2 条评论
Advanced RAG Techniques Part 1: Data Processing

2024年8月6日

Advanced RAG Techniques Part 1: Data Processing

This is Part 1 of our exploration into Advanced RAG Techniques. [Click here for Part 2!] The recent paper Searching for…

2 条评论
The Basics: Managing Time-Series Data with Elastic Datastreams

2024年7月24日

The Basics: Managing Time-Series Data with Elastic Datastreams

Second entry in my Basics Series. This article revolves around using the Elastic Query Domain Specific Language.

See all articles

Searching for Best Practices in RAG: The Sparknotes Version

Han Xiang Choong

Senior Customer Architect - APJ @ Elastic | Applied AI/ML | Search Experiences | Delivering Real-World Impact

Metrics

RAG capabilities

领英推荐

Highlights

Quick Takeaways

Han Xiang Choong的更多文章

社区洞察

其他会员也浏览了

?? Agents for Time Series Analysis

??Top ML Papers of the Week

A Complete Guide to Creating and Storing Vector Embeddings!

Building Retrieval Augmented Generation (RAG) from scratch - Feeding my Database Internal articles

Guidebook to the State-of-the-Art Embeddings and Information Retrieval

Optimizing Retrieval in Retriever Augmented Generation (RAG)

?? Trend Highlight: Advancements in Retrieval-Augmented Generation (RAG)

???????????? ?????????????????? ?????? ?????? ????????????????????????

Choosing the Right RAG Framework: LangChain or LlamaIndex?

Elevating Retrieval-Augmented Generation: A Data-Centric Guide to Optimal Performance

Metrics

RAG capabilities

领英推荐

Highlights

Quick Takeaways

Han Xiang Choong的更多文章

Improving e-Commerce Search with Query Profiles in Elastic

Code Snippet: Parallel LLM Calls

Snippet: Speeding up Bulk Upload Speeds to Elastic with Parallelisation in Python

Automating Traditional Search

A Personal Chatbot Interface with Elasticsearch & Streamlit

Search Concepts Cheatsheet - Elastic Oriented

Search in Elastic 8.15 - Building RAG Extremely Quickly WITHOUT Code

Advanced RAG Techniques Part 2: Querying and Testing

Advanced RAG Techniques Part 1: Data Processing

The Basics: Managing Time-Series Data with Elastic Datastreams

社区洞察

其他会员也浏览了

?? Agents for Time Series Analysis

??Top ML Papers of the Week

A Complete Guide to Creating and Storing Vector Embeddings!

Building Retrieval Augmented Generation (RAG) from scratch - Feeding my Database Internal articles

Guidebook to the State-of-the-Art Embeddings and Information Retrieval

Optimizing Retrieval in Retriever Augmented Generation (RAG)

?? Trend Highlight: Advancements in Retrieval-Augmented Generation (RAG)

???????????? ?????????????????? ?????? ?????? ????????????????????????

Choosing the Right RAG Framework: LangChain or LlamaIndex?

Elevating Retrieval-Augmented Generation: A Data-Centric Guide to Optimal Performance