RAG & GraphRAG

RAG & GraphRAG

Introduction

Large Language Models (LLMs) operate on fixed datasets, with their knowledge limited to the point of their last training update. This can result in outdated or inaccurate responses, as the models may "hallucinate" information.

Updating these models with new information or enhancing their context comprehension can be resource-intensive, requiring significant time and manpower for retraining or fine-tuning.


Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique designed to enhance LLMs by incorporating information from external, reliable knowledge bases.

The concept is simple: when an LLM is asked a question, it doesn't just rely on its pre-existing knowledge. Instead, it first retrieves relevant information from a specified knowledge source. This ensures that the generated outputs are enriched with the most current and contextually relevant data.

RAG ensures that LLMs provide up-to-date and accurate information, making them more reliable and effective.

Here are the key stages in a RAG pipeline:

Pre-Retrieval

In the RAG pipeline, pre-retrieval involves determining the data granularity, which is the level of detail or precision of the data to be searched. This step is crucial as it prepares the data for the retrieval process, enhancing the quality of the generated responses.

  • Data granularity can range from sentence-level (e.g., individual facts, sentences, or short paragraphs) to paragraph-level (e.g., entire documents or articles).
  • The choice of data granularity affects the model’s performance and its ability to generate accurate and contextually relevant text.
  • Fine-grained data can provide more specific and detailed information for the generation task, while coarse-grained data can provide broader context or general knowledge.

Chunking

  • This is the process of appropriately processing the input form of source data for quantification into a large language model. Since the number of tokens that can be input into a large language model is limited, it’s important to segment and input the information properly.
  • To provide ‘good’ information to a large language model, it’s crucial to give ‘appropriate’ context. Given the limited length (tokens), it’s important to preserve the organic relationship between contexts within the given context limit. Therefore, in processing relevant data, the issue of ‘data length limit’ arises.

Retrieval

  • This stage involves searching a document or text segment database to find content related to the user’s query. It includes understanding the intent and context of the query and selecting the most relevant documents or texts from the database based on this understanding.
  • For instance, when processing a query about “the health benefits of green tea,” the model finds documents mentioning the health benefits of green tea and selects them based on similarity metrics.

Post-Retrieval / Content Generation

  • This stage processes the retrieved information to effectively integrate it into the generation process. It may include summarizing the searched text, selecting the most relevant facts, and refining the information to better match the user’s query.


Why RAG Matters

Large Language Models (LLMs) power many natural language applications today, generating human-like text and understanding complex queries.

Despite their power, these models can sometimes produce confident yet incorrect responses, potentially misleading users.

There is one Imposter among us

Retrieval-Augmented Generation (RAG) addresses these issues by guiding the LLM to pull information from reliable sources. This approach enhances the relevance and accuracy of the model's outputs, ensuring users receive trustworthy and up-to-date information.


Limitations of RAG

As with all things in life, the conventional RAG approach has its complexities and challenges.

While groundbreaking in enhancing the capabilities of LLMs, RAG also has certain limitations that can impact their effectiveness and applicability.

It Starts With Semantic Search

Semantic similarity search is not magic, as with many other machine learning technologies. The embedding models, or autoencoders, learn input data features into the weights, which we call embedding vectors. Embedding vectors attract important information from the input text, and the vector similarity can be used to compare the closeness of the texts. Nevertheless, we don’t know what information has been extracted or how the information was organised in the vector, let alone how to make it more efficient or develop a more accurate similarity function.

As a consequence, please be prepared that semantic similarity searches may miss the goal from time to time. Assuming semantic search will always retrieve reasonable results is unrealistic.

Contextual Understanding: The play between Chunk Size and Top-k

Context is important, a well-placed comma too

In Retrieval-Augmented Generation (RAG) systems, the parameters of chunk size and top_k play a crucial role in performance. The chunk size should be chosen such that each chunk focuses on a single topic, as mixed-topic chunks lead to ineffective embeddings and poor retrieval quality. Introducing slight overlaps between chunks can prevent information loss at their boundaries, ensuring important details are retained. The top_k parameter, which determines the number of top-scored chunks used as input, must be carefully set. If the chunk size is too small or the information within chunks is sparse, a fixed top_k might not capture sufficient context, leading to suboptimal results.

Optimizing these parameters is akin to hyperparameter tuning in machine learning, requiring iterative adjustments to find the optimal balance. Proper tuning of chunk size and top_k can significantly enhance the effectiveness of RAG systems, ensuring they deliver high-quality, contextually accurate outputs.

Multi-hop Q&A

Let’s consider another scenario: we built a RAG system based on social media. Then we request: Who knows Elon Musk? Then the system will iterate through the vector database to extract a list of contacts for Elon Musk. Because of the limits of the chunk size and top_k, we can expect the list to be incomplete; nevertheless, functionally, it works.

Now, if we reframe our question and ask: Who can introduce Johnny Depp to Elon Musk, except Amber Heard? A single round of information retrieval cannot answer that kind of question. This type of question is called multi-hop Q&A. One way to solve it is:

  1. retrieve all contacts of Elon Musk
  2. retrieve all contacts of Johnny Depp
  3. check whether there’s any intersection between the two results, except Amber Heard
  4. Return the result if there’s any intersection, or extend the contacts of Elon Musk and Johnny Depp to their friends’ contacts and check again.

There are several architectures to accommodate this complicated algorithm; one of them uses sophisticated prompt engineering like ReACT, and another uses an external graph database to assist the reasoning. We just need to know that this is one of the limits of RAG systems.

Loss of Information

In a RAG system, several stages lead to potential information loss:

  1. Chunking and Embedding: Text is split into chunks, which are then converted into embeddings. This process can lose information due to chunk size limitations and the effectiveness of the embedding models.
  2. Retrieval: Chunks are retrieved based on semantic similarity. The top_k parameter limits the number of chunks retrieved, and the similarity function may not perfectly capture relevance, leading to further information loss.
  3. Response Generation: The final response is generated using the retrieved chunks. This stage can lose information due to the content length limit and the capabilities of the generative language model.

Each of these steps is inherently lossy, meaning there's no guarantee that all relevant information will be preserved throughout the process.


GraphRAG to the Rescue!

Graph RAG enhances the traditional RAG system by incorporating knowledge graphs (KGs), as pioneered by NebulaGraph. This approach transforms how large language models (LLMs) interpret and respond to queries by integrating structured data from KGs into their processing. KGs are composed of nodes (representing entities) and edges (representing relationships between these entities). By leveraging these structured representations, Graph RAG provides a more nuanced and informed basis for generating responses, improving the depth and accuracy of the information retrieved and used by the LLM.

Enhancing RAG Systems with Knowledge Graphs

Graph Retrieval and Reasoning

Graph Retrieval focuses on enhancing the context by fetching relevant information. Graph Reasoning applies to how this information is traversed and searched within RAG systems, improving the depth and relevance of the results.

Pre-Retrieval Phase

Knowledge Graph Indexing: Before retrieval, documents are semantically indexed based on nodes and edges within the knowledge graph. This helps in directly retrieving semantically related documents, enhancing the relevance of the fetched information.

Node and Subgraph Extraction:

  • Nodes: The system compares the user query with chunked nodes to find the most similar ones and uses their connected paths as query syntax. However, it requires specifying how many nodes within a path to fetch and relies heavily on the information extraction model used for creating the knowledge graph.
  • Variable Length Edges (VLE): VLE can fetch related information by traversing edges of varying lengths, necessitating database optimization for efficient retrieval.
  • Subgraphs: This involves fetching ego-graphs (subgraphs centered around relevant nodes) to compare the overall context with the user’s query. This method requires experimenting with different graph embedding techniques to find the most effective one.

Post-Retrieval Phase

In the Post-Retrieval phase, the challenge is to harmonize the data effectively. This stage mainly involves two processes: Re-ranking and Prompt Compression.

Re-Ranking Process: After retrieval, a re-ranking process uses values from both RAG and GraphRAG. Semantic search values from GraphRAG are combined with RAG’s similarity search values to generate context, enhancing the accuracy of the fetched information.

In Prompt Compression, the query result, specifically the Graph Path, is utilized as part of the Context + Prompt for answer generation, incorporating it as a prompt element.


GraphRAG limitations

GraphRAG, like RAG, has clear limitations, which include how to form graphs, generate queries for querying these graphs, and ultimately decide how much information to retrieve based on these queries. The main challenges are ‘query generation’, ‘reasoning boundary’, and ‘information extraction’. Particularly, the ‘reasoning boundary’ poses a significant limitation as optimizing the amount of related information can lead to overload during information retrieval, negatively impacting the core aspect of GraphRAG, which is answer generation.


Conclusion

In conclusion, the integration of knowledge graphs into Retrieval-Augmented Generation (RAG) systems marks a significant advancement in information retrieval and reasoning capabilities. By leveraging structured representations of entities and their relationships, Graph RAG enhances context relevance and response accuracy. Key considerations for effectively utilizing GraphRAG include information extraction techniques to infer and generate connections between chunked data, knowledge indexing for storage and retrieval, and models for generating graph queries, such as the Cypher Generation Model.


If you want to deep dive into the code you can refer my github, where I have played around with different datasets, exploring the power of graphs and how it runs in sync with vector search using Agents!

Github - https://github.com/Himank-J/Graph-RAG

We like this narrative! We are heavily involved in the implementation of RAG and GraphRAG for many organisations, especially highly regulated ones. Jump over to our page to learn more about all things RAG and GraphRAG ??

Ashant Chalasani

Enterprise SovereignAI, Sovereign Cloud | Hiring Thinkers

8 个月

#graphrag

回复
Tsung-Min Pai

AI Engineer @ GeniBuilder | Ex-AI Intern @ Compal | Ex-DL Intern @ IIS, Academia Sinica | Bachelor @ NTUEE

8 个月

Insightful!!

回复
Aneesh Chopra

PM - Analytics at Bajaj Finserv Health

9 个月

Graphraaaaaags!

Harshwardhan Fartale

Project Associate @IISC

9 个月

Great article and presentation Himank Jain ?

要查看或添加评论,请登录

Himank Jain的更多文章

社区洞察

其他会员也浏览了