Retriever Augmented Generation (RAG): Enhancing Language Models with External Knowledge
Retrieval Augmented Generation (RAG) By Accelerate.AI Careers

Retriever Augmented Generation (RAG): Enhancing Language Models with External Knowledge

Introduction

Retrieval-Augmented Generation (RAG) is a technique that enhances language model generation by incorporating external knowledge. This is typically done by retrieving relevant information from a large corpus of documents and using that information to inform the generation process.

Motivation

In numerous instances, clients possess extensive proprietary documents, such as technical manuals, and require the extraction of specific information from this voluminous content. This task can be likened to locating a needle in a haystack. Recently, OpenAI introduced a novel model, GPT4-Turbo, which boasts the capability to process large documents, potentially addressing this need. However, this model is not entirely efficient due to the "Lost In The Middle" phenomenon, where the model tends to forget content located towards the middle of its contextual window.

To circumvent this limitation, an alternative approach known as Retrieval-Augmented-Generation (RAG) has been developed. This method involves creating an index for every paragraph in the document. When a query is made, the most pertinent paragraphs are swiftly identified and subsequently fed into a Large Language Model (LLM) like GPT4. This strategy of providing only select paragraphs, as opposed to the entire document, prevents information overload within the LLM and significantly enhances the quality of the results.

Neural Retrieval

Neural retrievers are a type of information retrieval model that uses neural networks to match queries to relevant documents. They encode the query and documents into dense vector representations and compute similarity scores between them, allowing them to go beyond lexical matching and capture semantic relevance.

The Retrieval Augmented Generation (RAG) Pipeline

With RAG, the LLM is able to leverage knowledge and information that is not necessarily in its weights by providing it access to external knowledge sources such as databases. It leverages a retriever to find relevant contexts to condition the LLM, in this way, RAG is able to augment the knowledge-base of an LLM with relevant documents.

The retriever here could be a vector database, a graph database, or a regular SQL database, depending on the need for semantic retrieval or not.

Vector Store:

Typically, the user queries are embedded by using an embedding model (such as Open AI Embeddings or BERT or other embedding models) , alternatively TF-IDF could also be used for sparse embeddings. The search on vector store is conducted on vector search or keyword search or term frequency or semantic similarity. While Vector Databases partition and index data using LLM-encoded vectors, allowing for semantically similar vector retrieval, they may fetch irrelevant data. (Reference: Graph vs Vector database for RAG by Damien Benveniste, PhD)

Graph database:

Constructs a knowledge base from extracted entity relationships within the text. This approach is precise but may require exact query matching, which could be restrictive in some applications. A potential solution could be to combine the strengths of both databases: indexing parsed entity relationships with vector representations in a graph database for more flexible information retrieval. It remains to be seen if such a hybrid model exists.

Regular SQL database:

Provides structured data storage and retrieval but could lack the semantic flexibility of vector databases.

Post the retrieval of documents, the user might wish to rerank or filter out the retrieved candidates to match the business rules and criteria. These might alos be influenced by current context, business criteria, rules, personalization for the user, and response time limit.

To summarize, the process of a simple RAG comprises following steps:

  1. Vector Database Creation & Population - RAG begins by converting an internal dataset into vectors and storing them in a vector database (or a database of user's choice - could be Graph DB or a relational db as well).
  2. User Input - The user provides query/input for which they seek response/generated answer
  3. Information Retrieval - Involves scanning of all the vectorized and embedded documents to identify segments that are semantically similar to the embedded user’s query. These segments are then provided as an input to the LLM to enrich its context for generating responses.
  4. Combining Data - The selected data segments from the database are combined with the user’s initial query, thus creating an expanded prompt.
  5. Generating Text - The enlarged prompt, filled with added context, is then provided to the LLM, that further crafts the final, context-aware response.

Benefits of RAG

  • RAG doesn't require model retraining, saving time and computational resources.
  • It's effective even with a limited amount of labeled data.
  • RAG is best suited for scenarios with abundant unlabeled data but scarce labeled data and is ideal for applications like conversational assistants needing real-time access to specific information like product manuals.

RAG vs. Fine-tuning

While fine-tuning adapts the style, tone, and vocabulary of LLMs, RAG gives LLM systems access to factual, access-controlled, and timely information. The focus should be on RAG first, as a successful LLM application must connect specialized data to the LLM workflow. Once a first full application is working, fine-tuning can be added to improve the style and vocabulary of the system.

If you like this article, please subscribe to my Newsletter (AI Scoop) and Follow me for similar articles on Generative AI . Also, if you are like me who wants to listen and practice rather than just reading, then subscribe to my YouTube channel (https://www.youtube.com/@AccelerateAICareers). I have shared a complete Generative AI playlist there. I frequently add new content regarding popular LLMs and Generative AI both on LinkedIn and YouTube.


John Edwards

AI Experts - Join our Network of AI Speakers, Consultants and AI Solution Providers. Message me for info.

5 个月

Excited to dive into this newsletter! Can't wait to expand my knowledge on Generative AI and LLM.

Dr. Debashis Dutta

Opinions are my own | Accomplished Expert in Risk Analytics & Generative AI | PhD | 28+ Years in Consulting & Banking | PMP Certified | AWS ML Specialist & Solutions Architect | Google ML Engineer & Cloud Architect |

5 个月

Interesting read Snigdha Kakkar !Thanks fir sharing.

Dr. Lakkavallia AI Master MIT Sloan School Management

??Surgeon turned Data Scientist || AI advocate || Unlocking the rizz in Clin Research | Clin Data Management | People Centric || “Process Excellence” award Recipient || 1:1 Mentoring || Career Guidance || views r my own!

5 个月

Thank you Smitha K for this insightful post ??

Hasurungan Tobing

DNR-Discipline's No Reason. Senior Biology Teacher

5 个月

Snigdha Kakkar. Greetings Great for subscription newsletters about Generative AI and LLM #rag

Sanam Narula

Product @ Amazon | ?? Follow for insights to accelerate your Product Management Career

5 个月

Thanks for sharing ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了