Building a RAG Over Heterogeneous Data to Solve Business Challenges: What I Learned
Veeramanohar Avudaiappan
AI Research (Generative & Creative AI) | ML & Data Product Development
In this article, I’ll share the insights and experiences I’ve gained from building a Retrieval-Augmented Generation (RAG) system over heterogeneous data through multiple experiments, while also highlighting key concepts related to RAG and its application to solve business challenges. The article covers strategies for designing hybrid approaches tailored to various use cases, handling diverse data types, enhancing response quality, and optimizing the RAG pipeline for production environments.
What Is RAG, and How Can Organizations Leverage It to Address Business Challenges?
Retrieval-Augmented Generation (RAG) has been a trending topic for over a year. RAG enhances the response of a Large Language Model (LLM) or Foundation Model by allowing it to reference an authorized knowledge base outside its training data before generating a response. This increases trust in the generated answers and ensures the responses are relevant by incorporating updated data sources.
The traditional RAG process consists of three key steps:
As companies embrace data-driven decision-making, RAG is becoming integral to both internal tools and customer-facing products. Key use cases include:
These are just a few examples; RAG can be customized to fit the unique needs of various industries.
Designing a Multi-Pipeline Framework
Given the variety of data businesses use to meet customer needs, it's crucial to consider all data types when building a customer-facing RAG chatbot. Data can be structured (e.g., databases, spreadsheets) or unstructured (e.g., text, images, audio), and handling both efficiently is key. Breaking the application into specialized pipelines based on the data type enhances operational efficiency, and structuring the application around the data is essential.
A Router directs the flow based on the user's query and the available pipelines and data sources. A Small Language Model (SLM) like Microsoft Phi 3.5 or TinyLlama can be used as the router to analyze the prompt and select the best pipeline, as these models excel at simpler tasks and help minimize latency. In some cases, the SLM may decide not to use a RAG pipeline if the query doesn’t require it. The SLM’s prompt should include context on data sources and routing logic, helping it make better decisions. For ambiguous queries, the router should be provided with a prompt such that the SLM can ask relevant questions to refine its route and improve the user experience.
Preparing and Chunking Data
For pipelines involving RAG over unstructured textual data, selecting the right chunking method is crucial for effective information retrieval. Chunking refers to splitting large documents or files into smaller, meaningful segments to process them efficiently. This approach provides precise data to the LLM while avoiding long-context issues caused by passing entire documents.
Several chunking strategies include:
Metadata can also be added to each chunk, enabling filtering or routing based on specific logic during or after retrieval.
Thoroughly understanding requirements beforehand ensures the best chunking strategy is chosen for optimal performance.
Handling Structured and Tabular Data
LLMs are primarily trained on sequential text formats, making them less optimized for analyzing tabular data. Although some training may include structured data, it is often represented in transformed formats. For scenarios involving frequently updated data stored in SQL databases or CSV files, a structured approach is essential. Here’s a 3-step process to address this challenge:
领英推荐
In certain scenarios, raw results from the ER block can be presented directly to users.
This process ensures data is retrieved and analyzed through logical and business reasoning, delivering reliable and actionable responses.
Strategies for Multimodal Input Processing
Multimodality has become an inevitable aspect of Generative AI-based solutions. Businesses aiming to deliver exceptional customer experiences now prioritize enabling users to interact with solutions through various input modalities (text, image, video, audio, etc.).
For instance, in e-commerce, shopping experiences can be enhanced by allowing users to upload images to find similar products, ask queries about operational procedures, or raise damage-related concerns using image and text inputs.
One of the most widely used multimodal combinations is Image and Text. This can be broken down into three key combinations:
A multimodal embedding model maps both images and text into a unified vector space. During retrieval, the input query is transformed into a vector representation, and similar embeddings are identified based on distance metrics to fetch the most relevant information.
This approach adds more context to the retrieval process, offering comprehensive knowledge to the LLM/FM. Multimodal capabilities are especially valuable in content recommendation systems and healthcare applications, such as analyzing patient records alongside medical imaging for better insights.
Improving Response Quality with Reranking and Advanced Prompting Techniques
A reranking component plays a pivotal role in boosting the performance of a RAG system. Reranking involves reevaluating and reorganizing retrieved documents or data based on their relevance to the query. In traditional RAG workflows, candidate documents or chunks are initially retrieved using vector similarity search. An LLM then evaluates these documents for semantic relevance, assigns relevance scores, and reorders them based on priority. This process ensures that the LLM focuses on the most relevant information during response generation.
Prompt engineering is critical to improving LLM-based applications. One of the primary challenges in such systems is hallucinations—responses that are irrelevant, fabricated, or inconsistent with the input query.
To ensure consistent data flow, LLM blocks are prompted to generate structured outputs (e.g., JSON) using tools like Pydantic. Prompts define the required schema, and parsing is performed after each step to enforce format compliance. This structured approach minimizes errors and simplifies processing. For safety and governance, rule-based systems are integrated between the user and the LLM—both before input processing and after output generation. These systems enforce organizational principles and compliance frameworks like NIST AI RMF, EU AI Act, ensuring that the AI operates within predefined ethical and regulatory boundaries.
Exploring Graph RAG
Graph RAG combines traditional RAG with knowledge graphs, where entities are represented as nodes and relationships as edges. This enables the system to reason over complex, interconnected data, making it ideal for applications that require understanding relationships and handling multi-step queries.
However, challenges with Graph RAG include difficulties in integrating diverse data sources into a cohesive graph, performance issues as the graph grows, the significant computational resources required for building and querying large graphs, and the complexity of keeping knowledge graphs updated regularly.
The Future: AI Agents-based RAG vs Agentic RAG?
AI agents and Agentic AI, though often discussed interchangeably, differ in significant ways. AI agents, which are currently gaining widespread attention, are designed to perform specific tasks by automating workflows through external tools and predefined prompts. In contrast, Agentic AI operates at a higher level of complexity, emphasizing autonomy. It can make decisions, execute actions, and even learn independently to achieve predefined goals. Key characteristics of Agentic AI include perception, reasoning, action, and learning—enabling it to handle tasks that require problem-solving, adaptability, and advanced reasoning. While current AI agents are being extensively integrated into Retrieval-Augmented Generation (RAG) systems, leveraging frameworks like CrewAI and AutoGen to deploy specialized agent teams, the evolution toward Agentic AI promises even greater advancements. Its ability to reason, and enhance autonomy and adaptability not only improves the reliability of RAG systems but also holds the potential to transform the broader landscape of Generative AI.
Thanks for reading!