登录查看更多内容

Retriever Augmented Generation (RAG): Enhancing Language Models with External Knowledge

Snigdha Kakkar

?? Accelerate your AI career with daily insights! | 6x LinkedIn Top Voice (Generative AI, Data Science, Machine Learning) | Innovating in Generative AI space | Join 21K+ followers

发布日期: 2024年4月7日

Introduction

Retrieval-Augmented Generation (RAG) is a technique that enhances language model generation by incorporating external knowledge. This is typically done by retrieving relevant information from a large corpus of documents and using that information to inform the generation process.

Motivation

In numerous instances, clients possess extensive proprietary documents, such as technical manuals, and require the extraction of specific information from this voluminous content. This task can be likened to locating a needle in a haystack. Recently, OpenAI introduced a novel model, GPT4-Turbo, which boasts the capability to process large documents, potentially addressing this need. However, this model is not entirely efficient due to the "Lost In The Middle" phenomenon, where the model tends to forget content located towards the middle of its contextual window.

To circumvent this limitation, an alternative approach known as Retrieval-Augmented-Generation (RAG) has been developed. This method involves creating an index for every paragraph in the document. When a query is made, the most pertinent paragraphs are swiftly identified and subsequently fed into a Large Language Model (LLM) like GPT4. This strategy of providing only select paragraphs, as opposed to the entire document, prevents information overload within the LLM and significantly enhances the quality of the results.

Neural Retrieval

Neural retrievers are a type of information retrieval model that uses neural networks to match queries to relevant documents. They encode the query and documents into dense vector representations and compute similarity scores between them, allowing them to go beyond lexical matching and capture semantic relevance.

The Retrieval Augmented Generation (RAG) Pipeline

With RAG, the LLM is able to leverage knowledge and information that is not necessarily in its weights by providing it access to external knowledge sources such as databases. It leverages a retriever to find relevant contexts to condition the LLM, in this way, RAG is able to augment the knowledge-base of an LLM with relevant documents.

The retriever here could be a vector database, a graph database, or a regular SQL database, depending on the need for semantic retrieval or not.

Vector Store:

Typically, the user queries are embedded by using an embedding model (such as Open AI Embeddings or BERT or other embedding models) , alternatively TF-IDF could also be used for sparse embeddings. The search on vector store is conducted on vector search or keyword search or term frequency or semantic similarity. While Vector Databases partition and index data using LLM-encoded vectors, allowing for semantically similar vector retrieval, they may fetch irrelevant data. (Reference: Graph vs Vector database for RAG by Damien Benveniste, PhD)

Brij kishore Pandey 4 周前

Limitation of Transformers; Hallucination Awareness of…

Danny Butvinik 1 年前

LLM and Knowledge Graphs; GPT-4 with Wolfram; CHITA by…

Danny Butvinik 1 年前

Graph database:

Constructs a knowledge base from extracted entity relationships within the text. This approach is precise but may require exact query matching, which could be restrictive in some applications. A potential solution could be to combine the strengths of both databases: indexing parsed entity relationships with vector representations in a graph database for more flexible information retrieval. It remains to be seen if such a hybrid model exists.

Regular SQL database:

Provides structured data storage and retrieval but could lack the semantic flexibility of vector databases.

Post the retrieval of documents, the user might wish to rerank or filter out the retrieved candidates to match the business rules and criteria. These might alos be influenced by current context, business criteria, rules, personalization for the user, and response time limit.

To summarize, the process of a simple RAG comprises following steps:

Vector Database Creation & Population - RAG begins by converting an internal dataset into vectors and storing them in a vector database (or a database of user's choice - could be Graph DB or a relational db as well).
User Input - The user provides query/input for which they seek response/generated answer
Information Retrieval - Involves scanning of all the vectorized and embedded documents to identify segments that are semantically similar to the embedded user’s query. These segments are then provided as an input to the LLM to enrich its context for generating responses.
Combining Data - The selected data segments from the database are combined with the user’s initial query, thus creating an expanded prompt.
Generating Text - The enlarged prompt, filled with added context, is then provided to the LLM, that further crafts the final, context-aware response.

Benefits of RAG

RAG doesn't require model retraining, saving time and computational resources.
It's effective even with a limited amount of labeled data.
RAG is best suited for scenarios with abundant unlabeled data but scarce labeled data and is ideal for applications like conversational assistants needing real-time access to specific information like product manuals.

RAG vs. Fine-tuning

While fine-tuning adapts the style, tone, and vocabulary of LLMs, RAG gives LLM systems access to factual, access-controlled, and timely information. The focus should be on RAG first, as a successful LLM application must connect specialized data to the LLM workflow. Once a first full application is working, fine-tuning can be added to improve the style and vocabulary of the system.

If you like this article, please subscribe to my Newsletter (AI Scoop) and Follow me for similar articles on Generative AI . Also, if you are like me who wants to listen and practice rather than just reading, then subscribe to my YouTube channel (https://www.youtube.com/@AccelerateAICareers). I have shared a complete Generative AI playlist there. I frequently add new content regarding popular LLMs and Generative AI both on LinkedIn and YouTube.

AI Scoop

6,920 位关注者

John Edwards

AI Experts - Join our Network of AI Speakers, Consultants and AI Solution Providers. Message me for info.

5 个月

Excited to dive into this newsletter! Can't wait to expand my knowledge on Generative AI and LLM.

2 次回应

Dr. Debashis Dutta

5 个月

Interesting read Snigdha Kakkar !Thanks fir sharing.

2 次回应

Dr. Lakkavallia AI Master MIT Sloan School Management

??Surgeon turned Data Scientist || AI advocate || Unlocking the rizz in Clin Research | Clin Data Management | People Centric || “Process Excellence” award Recipient || 1:1 Mentoring || Career Guidance || views r my own!

5 个月

Thank you Smitha K for this insightful post ??

2 次回应

Hasurungan Tobing

DNR-Discipline's No Reason. Senior Biology Teacher

5 个月

Snigdha Kakkar. Greetings Great for subscription newsletters about Generative AI and LLM #rag

2 次回应

Sanam Narula

Product @ Amazon | ?? Follow for insights to accelerate your Product Management Career

5 个月

Thanks for sharing ??

3 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Retriever Augmented Generation (RAG): Enhancing Language Models with External Knowledge

Snigdha Kakkar

?? Accelerate your AI career with daily insights! | 6x LinkedIn Top Voice (Generative AI, Data Science, Machine Learning) | Innovating in Generative AI space | Join 21K+ followers

Introduction

Motivation

Neural Retrieval

The Retrieval Augmented Generation (RAG) Pipeline

Vector Store:

领英推荐

Graph database:

Regular SQL database:

Benefits of RAG

RAG vs. Fine-tuning

AI Scoop

6,920 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

How to Evaluate Large Language Models (LLMs)

RAG Foundry: A Framework for Enhancing LLMs for?RAG

Top LLM Papers of the Week (July Week 2, 2024)

Demystifying the Building Blocks: A Look Inside LLMs

Optimizing Response Efficiency: Semantic Caching Strategies in GPT Cache

Understanding the Core Components of LLMs: Vectors, Tokens, and Embeddings Explained

Understanding Transformers: A Deep Dive with PyTorch

How Large Language Models (LLMs) Work and How They Are Developed

Comprehending Retrieval-Augmented Generation: The What and How

Large Language Models

Introduction

Motivation

Neural Retrieval

The Retrieval Augmented Generation (RAG) Pipeline

Vector Store:

领英推荐

Graph database:

Regular SQL database:

Benefits of RAG

RAG vs. Fine-tuning

AI Scoop

6,920 位关注者

Catastrophic Forgetting in LLMs

2024年8月13日

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

2024年6月16日

Advancing Knowledge Integration in Large Language Models (2 interesting RAG-related Research papers summarized)

2024年6月4日

Retrieval-Augmented Language Models: Enhancing Knowledge and Factual Accuracy (Summarizing selected Research Paper on RAG)

2024年5月29日

Elevating RAG: Multimodal Integration, Advanced Techniques, and RAG 2.0

2024年5月22日

Evaluating RAG Systems: A Comprehensive Approach to Assessing Retrieval and Generation Performance

2024年5月13日

Exploring the Capabilities & Limitations of GPT-4: OpenAI's Large Language Model (Popular LLM Series)

2024年5月8日

Enhancing Response Synthesis in Retrieval-Augmented Generation (RAG) Systems

2024年5月6日

Deep Dive into Llama3 (Popular LLM Series)

2024年5月1日

Optimizing Retrieval in Retriever Augmented Generation (RAG)

2024年4月29日

社区洞察

其他会员也浏览了

How to Evaluate Large Language Models (LLMs)

RAG Foundry: A Framework for Enhancing LLMs for?RAG

Top LLM Papers of the Week (July Week 2, 2024)

Demystifying the Building Blocks: A Look Inside LLMs

Optimizing Response Efficiency: Semantic Caching Strategies in GPT Cache

Understanding the Core Components of LLMs: Vectors, Tokens, and Embeddings Explained

Understanding Transformers: A Deep Dive with PyTorch

How Large Language Models (LLMs) Work and How They Are Developed

Comprehending Retrieval-Augmented Generation: The What and How

Large Language Models