RAG

RAG stands for Retrieval-Augmented Generation. It’s a game-changer when working with LLMs.

RAG feeding LLM model

RAG is a technique where relevant information is retrieved from a vector database to help the LLM (Large Language Model) generate more accurate and informative responses. Here is a breakdown of the terms:

  1. Retrieval — This refers to the process of searching through a large database or corpus to find relevant information or documents that can help in generating a response.
  2. Augmented — This means enhancing the information available to the LLM model (which it gets by pre-trained dataset) with external sources (which can be a set of documents, data extracted from databases, etc).
  3. Generation — This refers to the LLM generating text in response to a user query.

RAG overcomes the following LLM challenges

LLM challenges are as follows —

  1. Knowledge cut-off date — LLMs only have information up to their training cut-off date, meaning they are limited to what was known at the time of training.
  2. No access to proprietary or enterprise data — LLMs cannot access specific private or company data.

Without RAG

RAG feeds LLMs with relevant current information (from open sources like the internet or closed sources like document collections). This means LLMs don’t rely solely on pre-trained data; they also use additional information, helping them generate more accurate and proprietary responses.

With RAG
Stages of RAG —

Following are the stages of RAG with code: Since there are different libraries and vector stores available, I will use one library to illustrate the example –

1. Data Ingestion/Load data source — This involves loading data(embeddings) into a vector database.

i. Text Splitter — In this stage, the complete data is divided into smaller chunks. Why? Because large datasets can be difficult for the model to process. Smaller chunks help the model handle the information more effectively, leading to more accurate and efficient responses.

ii. Get vector embeddings — The smaller chunks are then converted into vector embeddings.

iii. Store vector embeddings — Save all vector embeddings in a vector store such as FAISS, Chroma, Pinecone, etc. In the following code, we will use FAISS.

# import libraries
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain_community.vectorstores import FAISS


def get_text_chunks(text):
    """
        Split text into list of chunks of text that we will feed to our model
    """
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    chunks = text_splitter.split_text(text)
    return chunks


def get_vectorstore(text_chunks):
    """
        Text Embedding and Create Vector store
    """
    embeddings = HuggingFaceInstructEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
    return embeddings, vectorstore

text_chunks = get_text_chunks(text)
embeddings, vectorstore = get_vectorstore(text_chunks)        

2. Retrieval — This involves performing a semantic search based on the user query through the vector database to quickly and effectively retrieve the most relevant information.

def retrieve_texts(user_query, vectorstore, embeddings, k=5):
    """
    Retrieve relevant texts from the vector store based on a query
    """
    # Create an embedding for the query
    query_embedding = embeddings.embed_text(user_query)
    
    # Perform the search on the vector store
    results = vectorstore.similarity_search(query_embedding, k=k)
    
    # Extract and return the texts
    return [result['text'] for result in results]


relevant_texts = retrieve_texts(user_query, vectorstore, embeddings, k=3)
# we can pass relevant_texts to our LLM model        
Application of RAG –

  1. Customer Support — Extracts and summarizes information from large documents based on user queries.
  2. Chatbots
  3. PDF extraction

Finally –

I hope this blog clarifies RAG for you! The code provided is a simple example meant to illustrate the concept.

Got a particular ML topic you’re curious about? Drop your suggestions in the comments, and I’ll do my best to cover them. Thanks for reading!

Feel free to hit me up on LinkedIn. Coffee’s on me (virtually, of course) ??


Heena Goyal

Site Reliability Engineer

4 个月

Insightful!

Adarsh Srivastav

Engineering @Amazon | Data Structures and Algorithms | Java | AWS

4 个月

Interesting! Ishika

要查看或添加评论,请登录