登录查看更多内容

RAG & GraphRAG

Himank Jain

Senior Data Scientist @Bajaj Finserv Health | Translating complex data into simpler solutions for Healthcare | Problem Solver | Learner

发布日期: 2024年5月31日

Introduction

Large Language Models (LLMs) operate on fixed datasets, with their knowledge limited to the point of their last training update. This can result in outdated or inaccurate responses, as the models may "hallucinate" information.

Updating these models with new information or enhancing their context comprehension can be resource-intensive, requiring significant time and manpower for retraining or fine-tuning.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique designed to enhance LLMs by incorporating information from external, reliable knowledge bases.

The concept is simple: when an LLM is asked a question, it doesn't just rely on its pre-existing knowledge. Instead, it first retrieves relevant information from a specified knowledge source. This ensures that the generated outputs are enriched with the most current and contextually relevant data.

RAG ensures that LLMs provide up-to-date and accurate information, making them more reliable and effective.

Here are the key stages in a RAG pipeline:

Pre-Retrieval

In the RAG pipeline, pre-retrieval involves determining the data granularity, which is the level of detail or precision of the data to be searched. This step is crucial as it prepares the data for the retrieval process, enhancing the quality of the generated responses.

Data granularity can range from sentence-level (e.g., individual facts, sentences, or short paragraphs) to paragraph-level (e.g., entire documents or articles).
The choice of data granularity affects the model’s performance and its ability to generate accurate and contextually relevant text.
Fine-grained data can provide more specific and detailed information for the generation task, while coarse-grained data can provide broader context or general knowledge.

Chunking

This is the process of appropriately processing the input form of source data for quantification into a large language model. Since the number of tokens that can be input into a large language model is limited, it’s important to segment and input the information properly.
To provide ‘good’ information to a large language model, it’s crucial to give ‘appropriate’ context. Given the limited length (tokens), it’s important to preserve the organic relationship between contexts within the given context limit. Therefore, in processing relevant data, the issue of ‘data length limit’ arises.

Retrieval

This stage involves searching a document or text segment database to find content related to the user’s query. It includes understanding the intent and context of the query and selecting the most relevant documents or texts from the database based on this understanding.
For instance, when processing a query about “the health benefits of green tea,” the model finds documents mentioning the health benefits of green tea and selects them based on similarity metrics.

Post-Retrieval / Content Generation

This stage processes the retrieved information to effectively integrate it into the generation process. It may include summarizing the searched text, selecting the most relevant facts, and refining the information to better match the user’s query.

Why RAG Matters

Large Language Models (LLMs) power many natural language applications today, generating human-like text and understanding complex queries.

Despite their power, these models can sometimes produce confident yet incorrect responses, potentially misleading users.

Retrieval-Augmented Generation (RAG) addresses these issues by guiding the LLM to pull information from reliable sources. This approach enhances the relevance and accuracy of the model's outputs, ensuring users receive trustworthy and up-to-date information.

Limitations of RAG

As with all things in life, the conventional RAG approach has its complexities and challenges.

While groundbreaking in enhancing the capabilities of LLMs, RAG also has certain limitations that can impact their effectiveness and applicability.

It Starts With Semantic Search

Semantic similarity search is not magic, as with many other machine learning technologies. The embedding models, or autoencoders, learn input data features into the weights, which we call embedding vectors. Embedding vectors attract important information from the input text, and the vector similarity can be used to compare the closeness of the texts. Nevertheless, we don’t know what information has been extracted or how the information was organised in the vector, let alone how to make it more efficient or develop a more accurate similarity function.

As a consequence, please be prepared that semantic similarity searches may miss the goal from time to time. Assuming semantic search will always retrieve reasonable results is unrealistic.

Contextual Understanding: The play between Chunk Size and Top-k

Context is important, a well-placed comma too

In Retrieval-Augmented Generation (RAG) systems, the parameters of chunk size and top_k play a crucial role in performance. The chunk size should be chosen such that each chunk focuses on a single topic, as mixed-topic chunks lead to ineffective embeddings and poor retrieval quality. Introducing slight overlaps between chunks can prevent information loss at their boundaries, ensuring important details are retained. The top_k parameter, which determines the number of top-scored chunks used as input, must be carefully set. If the chunk size is too small or the information within chunks is sparse, a fixed top_k might not capture sufficient context, leading to suboptimal results.

领英推荐

The rise and fall of synthetic datasets and smaller…

Thomas Wolf 7 个月前

RAG Explained: How to Enhance Large Language Models…

Jaypalsinh Jadeja 1 个月前

Top LLM Papers of the Week (March Week-3 2024)

Kalyan KS 12 个月前

Optimizing these parameters is akin to hyperparameter tuning in machine learning, requiring iterative adjustments to find the optimal balance. Proper tuning of chunk size and top_k can significantly enhance the effectiveness of RAG systems, ensuring they deliver high-quality, contextually accurate outputs.

Multi-hop Q&A

Let’s consider another scenario: we built a RAG system based on social media. Then we request: Who knows Elon Musk? Then the system will iterate through the vector database to extract a list of contacts for Elon Musk. Because of the limits of the chunk size and top_k, we can expect the list to be incomplete; nevertheless, functionally, it works.

Now, if we reframe our question and ask: Who can introduce Johnny Depp to Elon Musk, except Amber Heard? A single round of information retrieval cannot answer that kind of question. This type of question is called multi-hop Q&A. One way to solve it is:

retrieve all contacts of Elon Musk
retrieve all contacts of Johnny Depp
check whether there’s any intersection between the two results, except Amber Heard
Return the result if there’s any intersection, or extend the contacts of Elon Musk and Johnny Depp to their friends’ contacts and check again.

There are several architectures to accommodate this complicated algorithm; one of them uses sophisticated prompt engineering like ReACT, and another uses an external graph database to assist the reasoning. We just need to know that this is one of the limits of RAG systems.

Loss of Information

In a RAG system, several stages lead to potential information loss:

Chunking and Embedding: Text is split into chunks, which are then converted into embeddings. This process can lose information due to chunk size limitations and the effectiveness of the embedding models.
Retrieval: Chunks are retrieved based on semantic similarity. The top_k parameter limits the number of chunks retrieved, and the similarity function may not perfectly capture relevance, leading to further information loss.
Response Generation: The final response is generated using the retrieved chunks. This stage can lose information due to the content length limit and the capabilities of the generative language model.

Each of these steps is inherently lossy, meaning there's no guarantee that all relevant information will be preserved throughout the process.

GraphRAG to the Rescue!

Graph RAG enhances the traditional RAG system by incorporating knowledge graphs (KGs), as pioneered by NebulaGraph. This approach transforms how large language models (LLMs) interpret and respond to queries by integrating structured data from KGs into their processing. KGs are composed of nodes (representing entities) and edges (representing relationships between these entities). By leveraging these structured representations, Graph RAG provides a more nuanced and informed basis for generating responses, improving the depth and accuracy of the information retrieved and used by the LLM.

Enhancing RAG Systems with Knowledge Graphs

Graph Retrieval and Reasoning

Graph Retrieval focuses on enhancing the context by fetching relevant information. Graph Reasoning applies to how this information is traversed and searched within RAG systems, improving the depth and relevance of the results.

Pre-Retrieval Phase

Knowledge Graph Indexing: Before retrieval, documents are semantically indexed based on nodes and edges within the knowledge graph. This helps in directly retrieving semantically related documents, enhancing the relevance of the fetched information.

Node and Subgraph Extraction:

Nodes: The system compares the user query with chunked nodes to find the most similar ones and uses their connected paths as query syntax. However, it requires specifying how many nodes within a path to fetch and relies heavily on the information extraction model used for creating the knowledge graph.
Variable Length Edges (VLE): VLE can fetch related information by traversing edges of varying lengths, necessitating database optimization for efficient retrieval.
Subgraphs: This involves fetching ego-graphs (subgraphs centered around relevant nodes) to compare the overall context with the user’s query. This method requires experimenting with different graph embedding techniques to find the most effective one.

Post-Retrieval Phase

In the Post-Retrieval phase, the challenge is to harmonize the data effectively. This stage mainly involves two processes: Re-ranking and Prompt Compression.

Re-Ranking Process: After retrieval, a re-ranking process uses values from both RAG and GraphRAG. Semantic search values from GraphRAG are combined with RAG’s similarity search values to generate context, enhancing the accuracy of the fetched information.

In Prompt Compression, the query result, specifically the Graph Path, is utilized as part of the Context + Prompt for answer generation, incorporating it as a prompt element.

GraphRAG limitations

GraphRAG, like RAG, has clear limitations, which include how to form graphs, generate queries for querying these graphs, and ultimately decide how much information to retrieve based on these queries. The main challenges are ‘query generation’, ‘reasoning boundary’, and ‘information extraction’. Particularly, the ‘reasoning boundary’ poses a significant limitation as optimizing the amount of related information can lead to overload during information retrieval, negatively impacting the core aspect of GraphRAG, which is answer generation.

Conclusion

In conclusion, the integration of knowledge graphs into Retrieval-Augmented Generation (RAG) systems marks a significant advancement in information retrieval and reasoning capabilities. By leveraging structured representations of entities and their relationships, Graph RAG enhances context relevance and response accuracy. Key considerations for effectively utilizing GraphRAG include information extraction techniques to infer and generate connections between chunked data, knowledge indexing for storage and retrieval, and models for generating graph queries, such as the Cypher Generation Model.

If you want to deep dive into the code you can refer my github, where I have played around with different datasets, exploring the power of graphs and how it runs in sync with vector search using Agents!

Github - https://github.com/Himank-J/Graph-RAG

Squirro

6 个月

We like this narrative! We are heavily involved in the implementation of RAG and GraphRAG for many organisations, especially highly regulated ones. Jump over to our page to learn more about all things RAG and GraphRAG ??

1 次回应

Ashant Chalasani

Enterprise SovereignAI, Sovereign Cloud | Hiring Thinkers

8 个月

#graphrag

Tsung-Min Pai

AI Engineer @ GeniBuilder | Ex-AI Intern @ Compal | Ex-DL Intern @ IIS, Academia Sinica | Bachelor @ NTUEE

8 个月

Insightful!!

Aneesh Chopra

PM - Analytics at Bajaj Finserv Health

9 个月

Graphraaaaaags!

2 次回应

Harshwardhan Fartale

Project Associate @IISC

9 个月

Great article and presentation Himank Jain ?

1 次回应

查看更多评论

要查看或添加评论，请登录

Himank Jain的更多文章

Accelerating Language Models with Multi-Token Prediction

2024年7月23日

Accelerating Language Models with Multi-Token Prediction

Meta's new research introduces an improved method for training Large Language Models (LLMs). This model predicts…
Understanding the variance of Variational Autoencoders

2024年7月13日

Understanding the variance of Variational Autoencoders

In the field of deep learning, autoencoders are well-known for their ability to compress and reconstruct data, aiming…
Decoding Autoencoders

2024年7月10日

Decoding Autoencoders

Autoencoders are a type of deep learning neural network that have found applications in various domains such as…
Tokenization & Byte-Pair Encoding

2024年7月1日

Tokenization & Byte-Pair Encoding

“Gemini-1.5 has a context length of 1M tokens.
Mastering Text Generation: Unveiling the Secrets of Decoding Strategies in Large Language Models

2024年6月20日

Mastering Text Generation: Unveiling the Secrets of Decoding Strategies in Large Language Models

Decoding in the context of large language models (LLMs) refers to the process of generating sequences of words or…

2 条评论
Optimisation Strategies to Speed up Transformers

2024年6月17日

Optimisation Strategies to Speed up Transformers

Transformers have revolutionised the field of natural language processing (NLP) and have found applications in various…

3 条评论
Why Decoder-only Transformers?

2024年6月10日

Why Decoder-only Transformers?

In the realm of natural language processing (NLP), Transformer architectures have revolutionized the way machines…
MORA: A High Rank PEFT Approach for Fine-Tuning

2024年6月3日

MORA: A High Rank PEFT Approach for Fine-Tuning

Introduction: Researchers from Microsoft and Beihang University have introduced MoRA, a new parameter-efficient…
xLSTM: A New Frontier in Large Language Model Efficiency and Performance

2024年5月27日

xLSTM: A New Frontier in Large Language Model Efficiency and Performance

The landscape of large language models (LLMs) has been revolutionized by Transformers since their debut in 2017. Prior…

3 条评论
How Tensor Streaming Processor (TSP) forms the backend for LPU?

2024年5月13日

How Tensor Streaming Processor (TSP) forms the backend for LPU?

Recently, Groq garnered attention for surpassing LLM inference benchmarks with their Language Processing Unit (LPU)…

See all articles

RAG & GraphRAG

Himank Jain

Senior Data Scientist @Bajaj Finserv Health | Translating complex data into simpler solutions for Healthcare | Problem Solver | Learner

Introduction

Retrieval-Augmented Generation (RAG)

Why RAG Matters

Limitations of RAG

领英推荐

GraphRAG to the Rescue!

Enhancing RAG Systems with Knowledge Graphs

GraphRAG limitations

Conclusion

Himank Jain的更多文章

社区洞察

其他会员也浏览了

Understanding RAG Evaluation Algorithms

How exactly LLM generates text?

Understanding the Basic Components of a Prompt in LLM Models

Evaluating LLM and RAG Systems

Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

Metrics That Matter: Measuring LLM Performance

Large Language Models - part 2

Steps to Build a Large Language Model (LLM)

LLM Deployment: 4 Paths to Production

A Coder's Take on Tasks Best Suited for Large Language Models

Introduction

Retrieval-Augmented Generation (RAG)

Why RAG Matters

Limitations of RAG

领英推荐

GraphRAG to the Rescue!

Enhancing RAG Systems with Knowledge Graphs

GraphRAG limitations

Conclusion

Himank Jain的更多文章

Accelerating Language Models with Multi-Token Prediction

Understanding the variance of Variational Autoencoders

Decoding Autoencoders

Tokenization & Byte-Pair Encoding

Mastering Text Generation: Unveiling the Secrets of Decoding Strategies in Large Language Models

Optimisation Strategies to Speed up Transformers

Why Decoder-only Transformers?

MORA: A High Rank PEFT Approach for Fine-Tuning

xLSTM: A New Frontier in Large Language Model Efficiency and Performance

How Tensor Streaming Processor (TSP) forms the backend for LPU?

社区洞察

其他会员也浏览了

Understanding RAG Evaluation Algorithms

How exactly LLM generates text?

Understanding the Basic Components of a Prompt in LLM Models

Evaluating LLM and RAG Systems

Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

Metrics That Matter: Measuring LLM Performance

Large Language Models - part 2

Steps to Build a Large Language Model (LLM)

LLM Deployment: 4 Paths to Production

A Coder's Take on Tasks Best Suited for Large Language Models