登录查看更多内容

Retrieval-Augmented Generation (RAG)

Raviraj Savaliya

Data Scientist @ AUDELA | Generative AI, LLM, Data Science

发布日期: 2023年12月2日

+ 关注

What is RAG?

Retrieval-Augmented Generation (RAG) is the concept to provide LLMs with additional information from an external knowledge source. It combines the strengths of retrieval and generative models. Sometimes, LLMs are not capable of generating an appropriate answer that requires knowledge that was not included in training such as new or domain-specific information. RAG bridges the gap between LLM's general knowledge and external content to help the LLM to generate more accurate and contextual results.

How Does RAG Work?

First, we convert external documents into a format that's accessible to LLM. Usually in vectors.

Steps to convert your document into vectors.

Collect all source documents
Clean documents
Load documents
Split text into chunks
Create embeddings for the text chunks
Store it in a vector store

The RAG process comes in three key parts:

Retrieval:?The user query is used to retrieve relevant context from an external knowledge source. For this, the user query is embedded with an embedding model into the same vector space as the additional context in the vector database. This allows us to perform a similarity search, and the top k closest data objects from the vector database are returned.
Augmentation:?The user query and the retrieved additional context are stuffed into a prompt template.
Generate:?Retrieval-augmented prompt is fed to the LLM and it will generate the text.

What Problems Does RAG Solve?

Reduced hallucinations:

LLM model may generate responses that are not accurate or relevant to context, especially when it’s assuming what it doesn’t know. RAG allows LLMs to draw upon external knowledge sources to supplement

Up-to-date information :?

If the external documents used for retrieval are regularly updated, the RAG model can have more recent information. This solves the problem of producing outdated and incorrect information.

Easy updates:

RAG frameworks bypass the need for costly time-intensive retraining and updating of foundation models. Source data can be easily updated by adding new documents.

Domain-specific knowledge:

RAG is an effective way to augment the foundation model with domain-specific data.LLM will be able to provide contextually relevant responses tailored to domain-specific data.

Prompt Engineering, RAG, or Fine-Tunining?

The choice between Prompt Engineering, RAG (Retrieval-Augmented Generation), and Fine-Tuning depends on the specific use case and requirements. Each approach serves different purposes and is suited to different scenarios. Here are a few questions you need to consider.

Will the amount of knowledge in a pre-trained model suffice for what I need it to do or does my use case require additional info and context?
Is my use case a standardized task or is it a domain-specific task?
Do I have a plethora of training data or am I limited?
Does the task require additional context and does the information need to be up-to-date?

领英推荐

Data Silos and Associated Problems, The Power of…

TeamEpic 1 年前

CLASSIFICATION OF DATA STRUCTURE

Yochana 9 个月前

Navigating the RAG Landscape: A Deep Dive into…

Ajay Verma 2 个月前

?Here's a brief overview of when to use each approach:

Prompt Engineering

When you want to provide specific instructions or guidance to the AI model for generating responses. It's ideal for situations where you have a clear idea of what you want the AI to produce and mostly where usecase rely on the model’s pre-trained knowledge.

RAG (Retrieval-Augmented Generation)

When you need AI to retrieve and incorporate information from a large knowledge base or corpus into its responses. It's beneficial when the context and relevance of information matter.

Fine-Tuning

When you want to adapt a pre-trained language model to perform specific tasks or excel in a particular domain. It's valuable for tasks where you have large data available.

Disadvantages of RAG

Latency Issues: The two-step process of first retrieving documents and then generating responses can introduce latency. This might not be suitable for applications that require real-time responses.
Context Length Limitation: We have to be cautious of the maximum context length that the decoder transformer can handle. For example, ChatGPT has a maximum context length of 4096 tokens?(which is ~3 pages of single-lined English text). If the combined length of the input sequence and the retrieved documents exceeds this limit, some information will have to be truncated, which can affect the quality of the response.
Dependent on Semantic Search: The effectiveness of RAG is highly reliant on the quality of the semantic search. If the search retrieves irrelevant or low-quality documents, the generated responses may also be of poor quality.
Requires Existing Data: RAG depends on having an existing database of documents to retrieve from. Without a substantial database, it’s not possible to leverage the benefits of RAG.

Implementation:

To implement RAG best way to use Langchain or Llama Index. I have implemented RAG on Adrew Huberman's podcast using Llama Index. Here is the link for the code.

https://github.com/ravi2799/RAG/blob/main/Llama_Index%20.ipynb

References:

Dhaval Prajapati

Project Engineering - Bombardier Aviation | Concordia University' 24 - MEng in Mechanical | Aerospace & Aviation Enthusiast | Ex - Tata Motors | PDPU' 19

1 年

Very useful ? Thanks for sharing this amazing article Raviraj ??

Pratik Gondaliya

Software Developer at PC-info | Masters Graduate | Seeking Opportunity to Apply Technical Skills for Societal Impact.

1 年

It's very informative

Meet Patel

Quality and Sustainability Specialist | M.Eng in Mechanical Engineering CO-OP |

1 年

Very Informative ?

查看更多评论

要查看或添加评论，请登录

Raviraj Savaliya的更多文章

Dive into the MCP Framework

2025年3月21日

Dive into the MCP Framework

What is MCP The Model Context Protocol (MCP) is a framework that helps AI assistants, especially those powered by Large…

2 条评论
Building LLMs for Production: Enhancing Prompt Effectiveness

2024年12月14日

Building LLMs for Production: Enhancing Prompt Effectiveness

Effective prompts are crucial for leveraging the full potential of LLM. Ambiguity in prompts can introduce challenges…

3 条评论

Retrieval-Augmented Generation (RAG)

Raviraj Savaliya

Data Scientist @ AUDELA | Generative AI, LLM, Data Science

What is RAG?

How Does RAG Work?

What Problems Does RAG Solve?

Prompt Engineering, RAG, or Fine-Tunining?

领英推荐

Prompt Engineering

RAG (Retrieval-Augmented Generation)

Fine-Tuning

Disadvantages of RAG

Implementation:

References:

Raviraj Savaliya的更多文章

社区洞察

其他会员也浏览了

Mastering Advanced Chaining Techniques in Pandas

How to Build a Knowledge Graph in Minutes (And Make It Enterprise-Ready) ??

How Vector Data Improves RAG Performance

Implementing Data Quality Rules with Great Expectations

Understanding Vector Databases and Their Role in Embeddings

ApertureData Problem of the Month: Debugging Data in the Dark

RAG using LangChain : Part 3- Vector Store/ Embedding Vector and Retrievers

Navigating the Curse of Dimensionality: Challenges and Solutions in High-Dimensional Data Analysis

A Practical Approach to Creating Extensible and Reusable Solutions in a Fast-Paced World

Why Data Science projects fail?

What is RAG?

How Does RAG Work?

What Problems Does RAG Solve?

Prompt Engineering, RAG, or Fine-Tunining?

领英推荐

Prompt Engineering

RAG (Retrieval-Augmented Generation)

Fine-Tuning

Disadvantages of RAG

Implementation:

References:

Raviraj Savaliya的更多文章

Dive into the MCP Framework

Building LLMs for Production: Enhancing Prompt Effectiveness

社区洞察

其他会员也浏览了

Mastering Advanced Chaining Techniques in Pandas

How to Build a Knowledge Graph in Minutes (And Make It Enterprise-Ready) ??

How Vector Data Improves RAG Performance

Implementing Data Quality Rules with Great Expectations

Understanding Vector Databases and Their Role in Embeddings

ApertureData Problem of the Month: Debugging Data in the Dark

RAG using LangChain : Part 3- Vector Store/ Embedding Vector and Retrievers

Navigating the Curse of Dimensionality: Challenges and Solutions in High-Dimensional Data Analysis

A Practical Approach to Creating Extensible and Reusable Solutions in a Fast-Paced World

Why Data Science projects fail?