RAG
RAG stands for Retrieval-Augmented Generation. It’s a game-changer when working with LLMs.
RAG is a technique where relevant information is retrieved from a vector database to help the LLM (Large Language Model) generate more accurate and informative responses. Here is a breakdown of the terms:
RAG overcomes the following LLM challenges
LLM challenges are as follows —
RAG feeds LLMs with relevant current information (from open sources like the internet or closed sources like document collections). This means LLMs don’t rely solely on pre-trained data; they also use additional information, helping them generate more accurate and proprietary responses.
Stages of RAG —
Following are the stages of RAG with code: Since there are different libraries and vector stores available, I will use one library to illustrate the example –
1. Data Ingestion/Load data source — This involves loading data(embeddings) into a vector database.
i. Text Splitter — In this stage, the complete data is divided into smaller chunks. Why? Because large datasets can be difficult for the model to process. Smaller chunks help the model handle the information more effectively, leading to more accurate and efficient responses.
ii. Get vector embeddings — The smaller chunks are then converted into vector embeddings.
iii. Store vector embeddings — Save all vector embeddings in a vector store such as FAISS, Chroma, Pinecone, etc. In the following code, we will use FAISS.
# import libraries
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain_community.vectorstores import FAISS
def get_text_chunks(text):
"""
Split text into list of chunks of text that we will feed to our model
"""
text_splitter = CharacterTextSplitter(
separator="\n",
chunk_size=1000,
chunk_overlap=200,
length_function=len,
)
chunks = text_splitter.split_text(text)
return chunks
def get_vectorstore(text_chunks):
"""
Text Embedding and Create Vector store
"""
embeddings = HuggingFaceInstructEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
return embeddings, vectorstore
text_chunks = get_text_chunks(text)
embeddings, vectorstore = get_vectorstore(text_chunks)
2. Retrieval — This involves performing a semantic search based on the user query through the vector database to quickly and effectively retrieve the most relevant information.
def retrieve_texts(user_query, vectorstore, embeddings, k=5):
"""
Retrieve relevant texts from the vector store based on a query
"""
# Create an embedding for the query
query_embedding = embeddings.embed_text(user_query)
# Perform the search on the vector store
results = vectorstore.similarity_search(query_embedding, k=k)
# Extract and return the texts
return [result['text'] for result in results]
relevant_texts = retrieve_texts(user_query, vectorstore, embeddings, k=3)
# we can pass relevant_texts to our LLM model
Application of RAG –
Finally –
I hope this blog clarifies RAG for you! The code provided is a simple example meant to illustrate the concept.
Got a particular ML topic you’re curious about? Drop your suggestions in the comments, and I’ll do my best to cover them. Thanks for reading!
Feel free to hit me up on LinkedIn. Coffee’s on me (virtually, of course) ??
Site Reliability Engineer
4 个月Insightful!
Engineering @Amazon | Data Structures and Algorithms | Java | AWS
4 个月Interesting! Ishika
Thanks for sharing