Retriever Augmented Generation (RAG): Enhancing Language Models with External Knowledge
Snigdha Kakkar
?? Accelerate your AI career with daily insights! | 6x LinkedIn Top Voice (Generative AI, Data Science, Machine Learning) | Innovating in Generative AI space | Join 21K+ followers
Introduction
Retrieval-Augmented Generation (RAG) is a technique that enhances language model generation by incorporating external knowledge. This is typically done by retrieving relevant information from a large corpus of documents and using that information to inform the generation process.
Motivation
In numerous instances, clients possess extensive proprietary documents, such as technical manuals, and require the extraction of specific information from this voluminous content. This task can be likened to locating a needle in a haystack. Recently, OpenAI introduced a novel model, GPT4-Turbo, which boasts the capability to process large documents, potentially addressing this need. However, this model is not entirely efficient due to the "Lost In The Middle" phenomenon, where the model tends to forget content located towards the middle of its contextual window.
To circumvent this limitation, an alternative approach known as Retrieval-Augmented-Generation (RAG) has been developed. This method involves creating an index for every paragraph in the document. When a query is made, the most pertinent paragraphs are swiftly identified and subsequently fed into a Large Language Model (LLM) like GPT4. This strategy of providing only select paragraphs, as opposed to the entire document, prevents information overload within the LLM and significantly enhances the quality of the results.
Neural Retrieval
Neural retrievers are a type of information retrieval model that uses neural networks to match queries to relevant documents. They encode the query and documents into dense vector representations and compute similarity scores between them, allowing them to go beyond lexical matching and capture semantic relevance.
The Retrieval Augmented Generation (RAG) Pipeline
With RAG, the LLM is able to leverage knowledge and information that is not necessarily in its weights by providing it access to external knowledge sources such as databases. It leverages a retriever to find relevant contexts to condition the LLM, in this way, RAG is able to augment the knowledge-base of an LLM with relevant documents.
The retriever here could be a vector database, a graph database, or a regular SQL database, depending on the need for semantic retrieval or not.
Vector Store:
Typically, the user queries are embedded by using an embedding model (such as Open AI Embeddings or BERT or other embedding models) , alternatively TF-IDF could also be used for sparse embeddings. The search on vector store is conducted on vector search or keyword search or term frequency or semantic similarity. While Vector Databases partition and index data using LLM-encoded vectors, allowing for semantically similar vector retrieval, they may fetch irrelevant data. (Reference: Graph vs Vector database for RAG by Damien Benveniste, PhD)
领英推荐
Graph database:
Constructs a knowledge base from extracted entity relationships within the text. This approach is precise but may require exact query matching, which could be restrictive in some applications. A potential solution could be to combine the strengths of both databases: indexing parsed entity relationships with vector representations in a graph database for more flexible information retrieval. It remains to be seen if such a hybrid model exists.
Regular SQL database:
Provides structured data storage and retrieval but could lack the semantic flexibility of vector databases.
Post the retrieval of documents, the user might wish to rerank or filter out the retrieved candidates to match the business rules and criteria. These might alos be influenced by current context, business criteria, rules, personalization for the user, and response time limit.
To summarize, the process of a simple RAG comprises following steps:
Benefits of RAG
RAG vs. Fine-tuning
While fine-tuning adapts the style, tone, and vocabulary of LLMs, RAG gives LLM systems access to factual, access-controlled, and timely information. The focus should be on RAG first, as a successful LLM application must connect specialized data to the LLM workflow. Once a first full application is working, fine-tuning can be added to improve the style and vocabulary of the system.
If you like this article, please subscribe to my Newsletter (AI Scoop) and Follow me for similar articles on Generative AI . Also, if you are like me who wants to listen and practice rather than just reading, then subscribe to my YouTube channel (https://www.youtube.com/@AccelerateAICareers). I have shared a complete Generative AI playlist there. I frequently add new content regarding popular LLMs and Generative AI both on LinkedIn and YouTube.
AI Experts - Join our Network of AI Speakers, Consultants and AI Solution Providers. Message me for info.
5 个月Excited to dive into this newsletter! Can't wait to expand my knowledge on Generative AI and LLM.
Opinions are my own | Accomplished Expert in Risk Analytics & Generative AI | PhD | 28+ Years in Consulting & Banking | PMP Certified | AWS ML Specialist & Solutions Architect | Google ML Engineer & Cloud Architect |
5 个月Interesting read Snigdha Kakkar !Thanks fir sharing.
??Surgeon turned Data Scientist || AI advocate || Unlocking the rizz in Clin Research | Clin Data Management | People Centric || “Process Excellence” award Recipient || 1:1 Mentoring || Career Guidance || views r my own!
5 个月Thank you Smitha K for this insightful post ??
DNR-Discipline's No Reason. Senior Biology Teacher
5 个月Snigdha Kakkar. Greetings Great for subscription newsletters about Generative AI and LLM #rag
Product @ Amazon | ?? Follow for insights to accelerate your Product Management Career
5 个月Thanks for sharing ??