登录查看更多内容

What to Expect from a Good RAG System

Rendy Bambang Junior

VP of Data at Evermos | Data and Tech

发布日期: 2024年12月2日

Chatbot is the most common implementations of LLMs. One of the biggest problems in chatbot is hallucination. The issue gives a rise to a method called RAG (Retrieval Augmented Generation). Simply put, instead of directly response to inquiries, chatbot will query relevant data or document from database to be used as a reference to respond. RAG is one of methods to do "grounding", a term commonly used to ensure the AI is not spitting out non sense, instead returning response based on relevant data in database we define.

So, RAG is increasingly more common. But do you know what to expect in a good RAG system? Recently I stumbled upon an RAG open source called RAGFlow. This open source redefine my standard for what a good RAG system should look like. Let's see what RAGFlow provide as features and why it matters.

Vector Processing and Optimization

During the process of preparing documents for placement in a vector database, "chunking" is required to break the documents into smaller parts, ensuring that the context fits within the token limit of the LLM's prompt. For example, in the case of FAQs, each question and answer should ideally be a separate chunk. More reading on chunking here. RAGFlow offers the option to choose the type of document for chunking, which is a great feature. You can align the document with the type you think is most appropriate. After that, you can directly check the chunking results to see if they are suitable.

After reviewing the chunking, retrieval testing can be performed. This test checks whether the search query returns relevant results based on what has been created. RAGFlow provides a tuning threshold to determine how similar the results need to be. This makes the process more transparent rather than operating as a black box.

Additionally, RAGFlow includes the RAPTOR feature, which enhances the accuracy of the results based on this paper. RAPTOR is particularly valuable because one of the challenges of chunking is that the pieces may not be ideal. Summarizing the chunks improves the accuracy significantly. Check this paper.

Abstraction: Assistant, Knowledge Base, Agent

After adding documents to provide context for the AI, users can create an "Assistant" or chatbot. It retains similar features, allowing customization of the system prompt, model parameter tuning, and even the use of multiple databases, referred to as "knowledgebases." Thus you can reuse, combine, and mix several knowledge bases to be used by different Assistant.

The most remarkable feature is the "Agent." With this feature, prompt chaining can be implemented directly. For instance, if a search is required, the system will perform it. It follows the initial categorization and executes the appropriate plugins according to the flow. However, due to the complexity of the Agent's process, each interaction can be costly as it may invoke the LLM and APIs multiple times. It is important to be mindful of this as the architect and carefully weigh the trade-offs based on the specific use case.

Deployment Modes

Once the Assistant is set up, users have various options for deployment. For embedding, select "Embedded" to generate an iframe. To call it as a backend API, use the API key. For direct web access, click "Preview." An impressive feature is its ability to monitor the number of API calls made with the key, which is highly useful for tracking usage.

领英推荐

A Practical Approach to Building & Evaluating Advanced…

Pavan Belagatti 4 个月前

Vector search, RAG, and large language models

Clara Shih 1 年前

The AI-Data Connection: Why Anthropic’s MCP Matters

ChandraKumar R Pillai 3 个月前

Deployment Mode, API (in background) and iframe (floating dialog)

Contextual RAG

A good chatbot needs to understand the context. I have not found this on RAGFlow documentation, but memory is a basic requirement for chatbot. There are two types of memory:

Short term memory. This might include recent discussions you have in the chat, summarize it in order to still have the history of conversation to fit in the prompting token limitation.
Long term memory. Memory across multiple conversation from the same user.

Read this for more details on how memory works and the implementation details.

Short and Long Term Memory from Langchain Documentation

Responsible AI on RAG

I believe a good RAG need to also be safe to minimize reputation and legal risk. It needs to

Disclose that you are AI (transparency), see PAIR for reference.
Ensure it only do what it supposed to do, avoid doing other things.
Check whether the input is safe from adversarial prompting, e.g. using Prompt Guard.
Ensure the output align, see what DoorDash do.

Note: There's no features as mentioned above on RAGFlow.

Conclusion

RAGFlow set the bar, how an RAG should look like. It doesn't have to follow all the necessary features in form of UI. But A good RAG should be:

Easy to do trial and error. Iteratively chunk document, test query.
Have a good and reusable abstractions, such as knowledge base and assistant.
Have options to deploy such as via API or embedded
Contextual RAG, good memory management
Responsible and safe RAG

Notes on RAGFlow

If you considering to use RAGFlow, please do it on your own risk. I only use RAGFlow as example of what a good RAG system looks like. Some drawbacks of RAGFlow I found during the experiment:

The default prompt is subpar, leading to poor initial results. It requires improvement to make it more user-friendly and human-like.
The architecture is complex, meaning the infrastructure cost are relatively high.
There are still bugs, and the error messages could be more helpful.

Check RAGFlow out here. For a more lightweight alternative with fewer features but effective performance, consider this option.

Muhamad Ichlas Juliansyah

Metallurgy Process Engineering & Technology Manager | Leading Metallurgy Team, Technical Competence, PIDs, Project, EPC

2 个月

Insightful

要查看或添加评论，请登录

Rendy Bambang Junior的更多文章

Pursuing a Responsible AI Chatbot Interface

2024年12月3日

Pursuing a Responsible AI Chatbot Interface

Chatbots are among the most widespread applications of Generative AI, ranging from general-purpose platforms like…

What to Expect from a Good RAG System

Rendy Bambang Junior

VP of Data at Evermos | Data and Tech

Vector Processing and Optimization

Abstraction: Assistant, Knowledge Base, Agent

Deployment Modes

领英推荐

Contextual RAG

Responsible AI on RAG

Conclusion

Notes on RAGFlow

Rendy Bambang Junior的更多文章

社区洞察

其他会员也浏览了

Goodbye Manual Data Entry: Automating Document Processing with AI

Glean vs Microsoft Copilot: Enterprise Search & Data Retrieval Comparison

Possible profit pools in Gen AI Stack

GenAI's Act II and Vector Databases (Vol. 9)

How do developers want to use AI tools?

GPT How-to: Configuring Microsoft Graph + OAuth (and various third party APIs)

KMWorld 2024: Showcasing Semantic Graph and RAG

How to Improve Your Business with Enterprise AI applications

Metadata Driven Content Generation using GenAI

The Rise of Low-Code/No-Code MLOps Platforms

Vector Processing and Optimization

Abstraction: Assistant, Knowledge Base, Agent

Deployment Modes

领英推荐

Contextual RAG

Responsible AI on RAG

Conclusion

Notes on RAGFlow

Rendy Bambang Junior的更多文章

Pursuing a Responsible AI Chatbot Interface

社区洞察

其他会员也浏览了

Goodbye Manual Data Entry: Automating Document Processing with AI

Glean vs Microsoft Copilot: Enterprise Search & Data Retrieval Comparison

Possible profit pools in Gen AI Stack

GenAI's Act II and Vector Databases (Vol. 9)

How do developers want to use AI tools?

GPT How-to: Configuring Microsoft Graph + OAuth (and various third party APIs)

KMWorld 2024: Showcasing Semantic Graph and RAG

How to Improve Your Business with Enterprise AI applications

Metadata Driven Content Generation using GenAI

The Rise of Low-Code/No-Code MLOps Platforms