A Beginner’s Guide to Retrieval-Augmented Generation (RAG)
Shivani Virdi
Engineering at Microsoft | Simplifying AI for Everyone | Empowering Productivity with Proven Frameworks and Processes
What is RAG?
RAG - Retrieval-Augmented Generation, simply put, allows you to chat with your data. Does that sound overly abstract?
Okay, let’s break it down — you’ve probably used ChatGPT (please don’t tell me you’re living under a rock). So how does ChatGPT know what to say in response to your queries? Two things: the vast amount of data it’s been trained on and its ability to generate human-like sequences — the AI magic. (Don’t worry, I won’t bore you with the technical details; I’ve added a reading resource for those interested.)
But what happens if you ask ChatGPT a question about a project or data that’s personal to you or your work? It would either:
So, what’s the solution? Naturally, you’d want to “augment” the LLM’s knowledge to ensure its responses are grounded in reality and factually correct based on your data.
There are a few different ways to achieve this, but we’re most interested in Retrieval-Augmented Generation (RAG). Here’s how it works: You “Retrieve” the most relevant data for the user’s query from your database, use this data to “Augment” the prompt you send to the LLM, and then let it “Generate” a response based on the user query + prompt + retrieved data.
Et voilà! You’ve successfully injected your data and combined it with the power of an LLM to produce factually correct responses based on your proprietary data. And now, you know what RAG is! ??
RAG Architecture
There are two main components to this technique: the “Retrieval” and the “Augmented Generation.” Let’s discuss each briefly.
Retrieval System
By now, you might understand that there’s no RAG without a retrieval system, so let’s discuss what the ambiguous term “retrieving the most relevant data” actually means.
Imagine you have a clothing and apparel store. You maintain a product catalogue, and now you’re tasked with implementing a chatbot to allow users to ask questions about the kind of products they’re looking for (style, size, colour, type, material—you know, the works).
An example user query: Can you show me red sneakers in size 10 that are made of leather? (Wow, bold choice.)
For the chatbot to answer this, you’d first need to retrieve the products from your catalogue that best match the user query. That’s where the retrieval system comes in. But how do you do that? You’d need to perform what’s called a “similarity search” on the product catalogue for the user query.
To perform a similarity search, you’d preprocess both the user query and the database to represent them in a way that allows you to use an appropriate statistical technique to calculate the distance between the two.
In RAG, this is arguably the most critical step to optimize for accuracy. The quality of the response depends on several factors: the choice of data representation, the amount of data you have, how you split it, the similarity measure used, and the strictness of your similarity search limits, among others.
Here are two examples to illustrate why optimization matters:
Correct Semantic Retrieval Example:
The system retrieves:
领英推荐
Incorrect Retrieval Example (Bad Semantic Search):
The system retrieves:
Hence, it’s crucial to represent data in a way that captures not just the words in the documents/data but also their semantic meaning. For example, the representations of “queen” and “king” should be semantically closer to each other, while “king” and “ring” should be farther apart—even though the words are similar in letters. These representations are called embeddings (I’ve added resources below to help you understand what embeddings are and the different types you can use).
Augmented Generation
Now that you’ve retrieved your data, it’s time to send it along with the user query and the system prompt to the LLM for it to generate a coherent response. Let’s see how this would look in our chatbot example.
This formatted prompt is then sent to the LLM, which generates the final response for the user.
LLM Response:
"Here are some options for red sneakers in size 10:
Let me know if you’d like more details or help choosing the perfect pair!"
At this stage, you can improve the quality of your output by tuning the system prompt. For instance, in the snippet above, the prompt is designed to handle incorrect or missing retrievals, making the final responses more robust.
?? Congratulations! You’re no longer a noob when it comes to RAG.
Further Reading Resources
Creative Problem Solver, Passionate to learn, love to motivate people to achieve their goals
2 个月Excellent. Well explained. Please make a series on these topics