RAG & GraphRAG
Himank Jain
Senior Data Scientist @Bajaj Finserv Health | Translating complex data into simpler solutions for Healthcare | Problem Solver | Learner
Introduction
Large Language Models (LLMs) operate on fixed datasets, with their knowledge limited to the point of their last training update. This can result in outdated or inaccurate responses, as the models may "hallucinate" information.
Updating these models with new information or enhancing their context comprehension can be resource-intensive, requiring significant time and manpower for retraining or fine-tuning.
Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a technique designed to enhance LLMs by incorporating information from external, reliable knowledge bases.
The concept is simple: when an LLM is asked a question, it doesn't just rely on its pre-existing knowledge. Instead, it first retrieves relevant information from a specified knowledge source. This ensures that the generated outputs are enriched with the most current and contextually relevant data.
RAG ensures that LLMs provide up-to-date and accurate information, making them more reliable and effective.
Here are the key stages in a RAG pipeline:
Pre-Retrieval
In the RAG pipeline, pre-retrieval involves determining the data granularity, which is the level of detail or precision of the data to be searched. This step is crucial as it prepares the data for the retrieval process, enhancing the quality of the generated responses.
Chunking
Retrieval
Post-Retrieval / Content Generation
Why RAG Matters
Large Language Models (LLMs) power many natural language applications today, generating human-like text and understanding complex queries.
Despite their power, these models can sometimes produce confident yet incorrect responses, potentially misleading users.
Retrieval-Augmented Generation (RAG) addresses these issues by guiding the LLM to pull information from reliable sources. This approach enhances the relevance and accuracy of the model's outputs, ensuring users receive trustworthy and up-to-date information.
Limitations of RAG
As with all things in life, the conventional RAG approach has its complexities and challenges.
While groundbreaking in enhancing the capabilities of LLMs, RAG also has certain limitations that can impact their effectiveness and applicability.
It Starts With Semantic Search
Semantic similarity search is not magic, as with many other machine learning technologies. The embedding models, or autoencoders, learn input data features into the weights, which we call embedding vectors. Embedding vectors attract important information from the input text, and the vector similarity can be used to compare the closeness of the texts. Nevertheless, we don’t know what information has been extracted or how the information was organised in the vector, let alone how to make it more efficient or develop a more accurate similarity function.
As a consequence, please be prepared that semantic similarity searches may miss the goal from time to time. Assuming semantic search will always retrieve reasonable results is unrealistic.
Contextual Understanding: The play between Chunk Size and Top-k
In Retrieval-Augmented Generation (RAG) systems, the parameters of chunk size and top_k play a crucial role in performance. The chunk size should be chosen such that each chunk focuses on a single topic, as mixed-topic chunks lead to ineffective embeddings and poor retrieval quality. Introducing slight overlaps between chunks can prevent information loss at their boundaries, ensuring important details are retained. The top_k parameter, which determines the number of top-scored chunks used as input, must be carefully set. If the chunk size is too small or the information within chunks is sparse, a fixed top_k might not capture sufficient context, leading to suboptimal results.
领英推荐
Optimizing these parameters is akin to hyperparameter tuning in machine learning, requiring iterative adjustments to find the optimal balance. Proper tuning of chunk size and top_k can significantly enhance the effectiveness of RAG systems, ensuring they deliver high-quality, contextually accurate outputs.
Multi-hop Q&A
Let’s consider another scenario: we built a RAG system based on social media. Then we request: Who knows Elon Musk? Then the system will iterate through the vector database to extract a list of contacts for Elon Musk. Because of the limits of the chunk size and top_k, we can expect the list to be incomplete; nevertheless, functionally, it works.
Now, if we reframe our question and ask: Who can introduce Johnny Depp to Elon Musk, except Amber Heard? A single round of information retrieval cannot answer that kind of question. This type of question is called multi-hop Q&A. One way to solve it is:
There are several architectures to accommodate this complicated algorithm; one of them uses sophisticated prompt engineering like ReACT, and another uses an external graph database to assist the reasoning. We just need to know that this is one of the limits of RAG systems.
Loss of Information
In a RAG system, several stages lead to potential information loss:
Each of these steps is inherently lossy, meaning there's no guarantee that all relevant information will be preserved throughout the process.
GraphRAG to the Rescue!
Graph RAG enhances the traditional RAG system by incorporating knowledge graphs (KGs), as pioneered by NebulaGraph. This approach transforms how large language models (LLMs) interpret and respond to queries by integrating structured data from KGs into their processing. KGs are composed of nodes (representing entities) and edges (representing relationships between these entities). By leveraging these structured representations, Graph RAG provides a more nuanced and informed basis for generating responses, improving the depth and accuracy of the information retrieved and used by the LLM.
Enhancing RAG Systems with Knowledge Graphs
Graph Retrieval and Reasoning
Graph Retrieval focuses on enhancing the context by fetching relevant information. Graph Reasoning applies to how this information is traversed and searched within RAG systems, improving the depth and relevance of the results.
Pre-Retrieval Phase
Knowledge Graph Indexing: Before retrieval, documents are semantically indexed based on nodes and edges within the knowledge graph. This helps in directly retrieving semantically related documents, enhancing the relevance of the fetched information.
Node and Subgraph Extraction:
Post-Retrieval Phase
In the Post-Retrieval phase, the challenge is to harmonize the data effectively. This stage mainly involves two processes: Re-ranking and Prompt Compression.
Re-Ranking Process: After retrieval, a re-ranking process uses values from both RAG and GraphRAG. Semantic search values from GraphRAG are combined with RAG’s similarity search values to generate context, enhancing the accuracy of the fetched information.
In Prompt Compression, the query result, specifically the Graph Path, is utilized as part of the Context + Prompt for answer generation, incorporating it as a prompt element.
GraphRAG limitations
GraphRAG, like RAG, has clear limitations, which include how to form graphs, generate queries for querying these graphs, and ultimately decide how much information to retrieve based on these queries. The main challenges are ‘query generation’, ‘reasoning boundary’, and ‘information extraction’. Particularly, the ‘reasoning boundary’ poses a significant limitation as optimizing the amount of related information can lead to overload during information retrieval, negatively impacting the core aspect of GraphRAG, which is answer generation.
Conclusion
In conclusion, the integration of knowledge graphs into Retrieval-Augmented Generation (RAG) systems marks a significant advancement in information retrieval and reasoning capabilities. By leveraging structured representations of entities and their relationships, Graph RAG enhances context relevance and response accuracy. Key considerations for effectively utilizing GraphRAG include information extraction techniques to infer and generate connections between chunked data, knowledge indexing for storage and retrieval, and models for generating graph queries, such as the Cypher Generation Model.
If you want to deep dive into the code you can refer my github, where I have played around with different datasets, exploring the power of graphs and how it runs in sync with vector search using Agents!
We like this narrative! We are heavily involved in the implementation of RAG and GraphRAG for many organisations, especially highly regulated ones. Jump over to our page to learn more about all things RAG and GraphRAG ??
Enterprise SovereignAI, Sovereign Cloud | Hiring Thinkers
8 个月#graphrag
AI Engineer @ GeniBuilder | Ex-AI Intern @ Compal | Ex-DL Intern @ IIS, Academia Sinica | Bachelor @ NTUEE
8 个月Insightful!!
PM - Analytics at Bajaj Finserv Health
9 个月Graphraaaaaags!
Project Associate @IISC
9 个月Great article and presentation Himank Jain ?