Start chatting with your enterprise knowledge base with a LLM
Key topics covered in this article:
Introduction
ChatGPT from OpenAI, Bard, from Google and Llama-2 family of Large Language models has fueled interest in GenAI. Many enterprises are looking forward to use its knowledge base with available LLMs and to generate responses that are specific to the enterprise and have context of its knowledge. Augmented retrieval generation enables this use case. First understand the limitation of a LLM and how RAG (Retrieval Augmented Generation) enables us to overcome them.
Large Language Models are trained on publicly available information and have a knowledge cutoff date
A Large language model is pretrained on a corpus of texts sourced from publicly available information on internet. For example, Common crawl is an open-source repository of web crawl data that is used extensively along with other publicly available resources. Once pretrained, a LLM goes through fine tuning that involves Supervised Fine Tuning and RLHF coupled with Safety training to become usable for chat applications, summarization, content generation etc. As a result, a LLM trained on a public data will not have information on internal knowledge base. A LLM also has a long training duration and will not have information of any event after its knowledge cut off date.
Retrieval augmented generation enables use of Enterprise data. The whole process can be divided into 3 steps -
Transform the data to make it usable along with a LLM
In an enterprise, most of its knowledge base contains documents such as pdf, word documents, markdown files, HTML files and so on. All such documents need to be split into smaller chunks, embedded and stored in a vector database.
Splitting the documents
First step is to load the data and split these documents in smaller chunks. These smaller chunks are searched to retrieve relevant information as per the query. One important thing is to store a document’s meta data along with these chunks for enabling better contextual retrieval at a later stage. All these chunks are overlapped to have a consistent context from one chunk to another.
Embedding
Once a document is split into smaller chunks, these chunks need to be embedded. Embedding is a process of creating the numerical representation of a chunk. These numerical representations are called Vectors. Cosine similarity is used to retrieve vectors similar to the user query. There are many embedding models available that can be used to embed these chunks.
Vector DB
So far, we have broken text into smaller chunks and embedded. We need to store them for eventual retrieval. A vector database is used to store these embeddings. A vector data base uses k nearest neighbor and related algorithms to efficiently retrieve relevant vectors. Some of the vector database providers are Chroma, Pinecone, Weaviate, Elastic Search, and redis.
Retrieval
Once documents have been transformed and stored in a vector database, we need to retrieve them. A query is passed to the vector data base and matching chunks are retrieved.
Similarity Search and Maximum Marginal Relevance Search
There are two ways for retrieval, similarity search and maximum marginal retrieval search. Similarity search will get document chunks that are the most similar. Maximum Marginal Relevance Search retrieves diverse chunks and returns chunks that are not similar to the query but still relevant. A parameter defines numbers of chunks that will be retrieved and later k documents will be selected from the top and as well from the bottom most.
领英推荐
Use of metadata in retrieving a relevant document
At times, there will be queries that will have question around a specific document and a better retrieval will happen if the metadata context can be provide while querying. Adding a filter to the query that defines what source needs to be looked at is one of doing that. Using a LLM to retrieve the context from the query and then pass that along with query is also used for an efficient use of metadata.
Response generation
Now any query posted by a user is passed with retrieved context to the LLM. LLM makes use of supplied information in its response. A prompt template can be used to instruct LLM to avoid hallucination such as “You are a helpful assistance and if you do not the answer, respond by saying that you do not know the answer.
Improving retrieval by query expansion
Using similarity search for retrieving context from a vector data base has a downside, it does not retrieve chunks that are relevant for the answer as they may not be similar to query/question vector. We can overcome this shortcoming by using below techniques.
Query expansion by adding a hypothetical answer
One of the solutions of above problem is to ask LLM to generate a hypothetical answer before sending query for retrieval. A hypothetical answer generated by LLM is appended with the user query and sent to vector database for retrieval relevant of document chunks. Suppose you are analyzing a financial report and pose a question about the total revenue or any sector specific query. We can generate a hypothetical answer from a LLM and append into the query and sent for retrieval for better context. Retrieved documents are then sent to LLM for processing the answer.
Query expansion by adding similar queries
Another simple approach is to send the query to LLM and instruct LLM to generate 'n' number of similar queries in simple sentences. Once similar queries are generated send them to retrieve context and send the context to LLM for response generation.
Evaluating responses generated by LLM
At the core, a LLM is a next word predicting machine and there is no guarantee that response will be relevant, contextual, or grounded. TruLens provides feedback functions that can be uses for evaluating a response on all three parameters.
How relevant the answer is for a given query? Here a LLM evaluated answer of the query and score it.
Is the answer based on the retrieved context? Here a LLM evaluates provided context and answer and score it.
How relevant the context is for a given query? Here a LLM evaluates context to the query and score it.
Conclusion
Retrieval augmentation is evolving and newer techniques to make it better are being introduced. An advanced retrieval system can enable an effective and friendly use of internal knowledge base and thus initiate a move away from the current keyword-based retrieval systems in place.