Chat with your private data using RAG & LLMs
Large Language Models(LLMs) like ChatGPT have been making a significant impact across various industries with their capabilities such as summarization, question answering, and text generation. While LLMs excel at general knowledge, they stumble when it comes to personalized information since they have access only to the data they are trained on.?
In a way, their ‘Knowledge base’ is stuck in the past. They lack the ability to access your domain-specific data, whether it's customer records, proprietary documents, product specifications, or specialized databases.
So, when you ask questions like:?
How many customer accounts do we have??
ChatGPT can’t provide an accurate answer since it doesn't have access to that data point. In certain instances, LLMs like ChatGPT may even generate fabricated information, a phenomenon often referred to as "hallucination," leading to incorrect or inaccurate responses.
By connecting to private data, we can obtain accurate answers and derive numerous insights. One approach to achieving this is by regularly retraining an LLM with domain-specific data, but this process can be resource-intensive.
Fortunately, we have a simple solution to this challenge.
It’s called Retrieval Augmented Generation( RAG).
RAG is a framework for improving the accuracy of responses of LLMs by connecting them to external data sources. Using RAG, we can access relevant information from our private data and enhance LLMs with it, enabling them to deliver concise and factually accurate responses to our queries.
RAG even allows linking to the original data sources so that LLMs can provide evidence and citations to the end users.
On a high level, the architecture looks like this:?
Step 1: Create embeddings from the private data and store them in a vector database:
We split the private data into multiple chunks and use an embedding model to crate embeddings and store them in a vector database.?Embedding is a process in which we transform chunks of data into math vectors. These vectors are stored in a vector database.
领英推荐
There are multiple options for embedding models and vector databases, please use these two links for comparison.?
Step 2: Create embeddings from the user query and find similar vectors
We use the same embedding model to create embeddings and retrieve the top n vectors which are similar to user query. We can use different algorithms like cosine similarity, and maximal marginal relevance(MMR) to find similar vectors.
These vectors represent the chunks of data that are relevant to the user query.
Step 3: Pass the relevant documents along with the query to the LLM:?
We then pass the relevant pieces of information, along with the query, to the LLM, which acts as a reasoning agent and delivers a well-articulated response to the user.
As for LLMs, we can utilize proprietary models such as OpenAI's ChatGPT with API, or alternatively, we can run open-source models like Llama2 on our own machines.?
You can find more information about RAG on Meta's blog
Hope you enjoyed this short article!!