Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique in natural language processing that uses knowledgebase information retrieval?to create accurate and relevant responses. RAG begins with a user posing a question. Instead of relying on pre-trained data, the system first tries to retrieve relevant info from a knowledge base. This retrieved data is then combined with the user's query to generate an informed and specific response. By integrating external knowledge sources during the generation process, RAG improves the precision of AI-powered applications and reduces hallucinations.?

RAG is best utilized when processing large datasets that cannot fit within the LLM’s context window or when?the ability to display retrieved documents is essential. However, RAG may not be necessary for scenarios involving small datasets that can fit within the LLM's context window or for tasks like single-document analysis or summarization, where latency is not a critical factor. Modern RAG implementations benefit from frameworks such as LangChain and LlamaIndex, which can simplify the development process. These frameworks provide user-friendly tools and classes, reducing the need to build complex systems from scratch.?

Without RAG, it will be simple API call to LLMs, where user query will be as is to LLMs for response (System queries can be added to instruct LLMs on certain aspects).

[Benefits]?

[1] Accuracy Improvements - Enhances LLMs response by referring to domain-specific knowledgebase.?

[2] Reduce Hallucination - Minimizes false information by retrieving relevant external data.?

[3] External Knowledgebases - Can be integrated with private documentation, PDFs, codebases, or SQL databases.?

[4] Verification of Responses - Allows to validate the source of information.?

[Challenges]?

[1] Latency – May increase due to processing larger knowledge bases.?

[2] Irrelevant Data – Risks of irrelevant info in dataset. ?

[3] Higher Costs – Execution costs can go higher.?

In this article, we?will?explore Retrieval-Augmented Generation (RAG) based approach using domain-specific datasets. We will try to achieve contextually relevant answers with our own dataset. Here focus is on leveraging a water treatment dataset, transforming raw documents into embeddings, and utilizing cosine similarity techniques to identify relevant chunks of information. By incorporating these chunks into a conversational AI system powered by the Gemini model, we aim to demonstrate how LLMs can be augmented with custom knowledge bases to minimize hallucinations. Through this step-by-step guide, let's explore topics such as chunking, embedding generation, and similarity-based retrieval. ?

[1] Install necessary Python libraries and add imports.

[2] Initializes the tokenizer and model for the pre-trained "facebook/opt-125m" model using the Hugging Face Transformers library.?

[3] Load your dataset into Pandas DataFrame for further analysis and processing. Our Dataset contains detailed information about water treatment proposals, including technical specifications, maintenance requirements, and commercial terms from multiple vendors. Data is kept in CSV file with four columns – Title, Content, URL, Source?

[4] Write methods to split the quote text into small chunks based on sentences. Chunk data should look something like this (these chunks are for second quote)?

[5] Create chunks and embeddings (using the tokenizer). Generate embeddings for a given text using OPT-125M. ?

Put the chunks and embeddings into a new data frame for easier access. ?

[6] Add logic to retrieve?most relevant chunks of text based on a user's query by calculating cosine similarity between the query embedding and precomputed embeddings in the dataset. Also?combine relevant chunks into a single string to pass into context prompt.?

[7] Build the context prompt?with user query and most relevant chunks.

[8] Call the Generative AI model to generate the response. I am using Gemini-1.5, you can use other alternative depending on your access and API key availability. ?

With RAG based approach, generated Answer: Global Water Solutions' maintenance involves regular regeneration (frequency depends on water usage and hardness, e.g., every 120,000 liters for one system). Salt (120-190 kg per regeneration) is needed for this process. The brine tank should be cleaned before each regeneration. Pre-filter mesh cleaning is also recommended, depending on feed water dust levels. Additionally, adhering to the post-installation maintenance manual is crucial. ? ?

[9] If we do this without passing chunks and relevant dataset – model will give generic response or response without fact, basically tends to create response and could tend to hallucinate. ?

Without RAG based approach, generated Answer: Global Water Solutions apartment water treatment systems require varying maintenance depending on the specific system installed. Generally, this includes regular filter replacements (frequency depends on water quality and usage), periodic inspections of the system components for leaks or damage, and occasional professional servicing for cleaning or component replacement. Specific maintenance schedules and instructions should be provided by Global Water Solutions or found in the system's user manual. ?

From both responses, it is clear that the process of RAG ensures fact-based responses by retrieving relevant chunks of information from a dataset based on the user’s query. By calculating cosine similarity, the most relevant chunks are combined into a single string and passed into the prompt to guide the AI model.? Without RAG, the model may rely on generic knowledge, risking factual inaccuracies or hallucinations, highlighting the importance of leveraging relevant datasets for generating reliable and specific responses.?

?Note - You can use different LLMs and tokenizers as per your convenience and availability.

Summary?

In this article, we explored the application of RAG - using a water treatment quotes dataset, we covered how to transform raw data into embeddings and utilize cosine similarity to identify relevant chunks of information. This approach also integrated chunks into?the Gemini model, showcasing how RAG minimizes hallucinations by ensuring fact-based responses. Through a step-by-step guide, we covered critical aspects like chunking, embedding generation, and similarity-based retrieval. By comparing outputs with and without RAG, the importance of leveraging relevant datasets for generating precise, contextually relevant answers - is clear. ?

Matt Holocher

Solutions Consulting Director, Strategic Engagements at Tungsten Automation

1 个月

Have you tried any of this in the latest versions of TotalAgility? Coming this year we will be taking this even further. Exciting times for sure.

要查看或添加评论,请登录

Vijay Chaudhary的更多文章

社区洞察

其他会员也浏览了