What to Expect from a Good RAG System

What to Expect from a Good RAG System

Chatbot is the most common implementations of LLMs. One of the biggest problems in chatbot is hallucination. The issue gives a rise to a method called RAG (Retrieval Augmented Generation). Simply put, instead of directly response to inquiries, chatbot will query relevant data or document from database to be used as a reference to respond. RAG is one of methods to do "grounding", a term commonly used to ensure the AI is not spitting out non sense, instead returning response based on relevant data in database we define.

So, RAG is increasingly more common. But do you know what to expect in a good RAG system? Recently I stumbled upon an RAG open source called RAGFlow. This open source redefine my standard for what a good RAG system should look like. Let's see what RAGFlow provide as features and why it matters.

Vector Processing and Optimization

During the process of preparing documents for placement in a vector database, "chunking" is required to break the documents into smaller parts, ensuring that the context fits within the token limit of the LLM's prompt. For example, in the case of FAQs, each question and answer should ideally be a separate chunk. More reading on chunking here. RAGFlow offers the option to choose the type of document for chunking, which is a great feature. You can align the document with the type you think is most appropriate. After that, you can directly check the chunking results to see if they are suitable.

Chunking Method
Chunking Results

After reviewing the chunking, retrieval testing can be performed. This test checks whether the search query returns relevant results based on what has been created. RAGFlow provides a tuning threshold to determine how similar the results need to be. This makes the process more transparent rather than operating as a black box.

Example Test Query and Result

Additionally, RAGFlow includes the RAPTOR feature, which enhances the accuracy of the results based on this paper. RAPTOR is particularly valuable because one of the challenges of chunking is that the pieces may not be ideal. Summarizing the chunks improves the accuracy significantly. Check this paper.

RAPTOR Prompt


Abstraction: Assistant, Knowledge Base, Agent

After adding documents to provide context for the AI, users can create an "Assistant" or chatbot. It retains similar features, allowing customization of the system prompt, model parameter tuning, and even the use of multiple databases, referred to as "knowledgebases." Thus you can reuse, combine, and mix several knowledge bases to be used by different Assistant.

Assistant Configuration

The most remarkable feature is the "Agent." With this feature, prompt chaining can be implemented directly. For instance, if a search is required, the system will perform it. It follows the initial categorization and executes the appropriate plugins according to the flow. However, due to the complexity of the Agent's process, each interaction can be costly as it may invoke the LLM and APIs multiple times. It is important to be mindful of this as the architect and carefully weigh the trade-offs based on the specific use case.

Agent Workflow

Deployment Modes

Once the Assistant is set up, users have various options for deployment. For embedding, select "Embedded" to generate an iframe. To call it as a backend API, use the API key. For direct web access, click "Preview." An impressive feature is its ability to monitor the number of API calls made with the key, which is highly useful for tracking usage.


Deployment Mode, API (in background) and iframe (floating dialog)

Contextual RAG

A good chatbot needs to understand the context. I have not found this on RAGFlow documentation, but memory is a basic requirement for chatbot. There are two types of memory:

  1. Short term memory. This might include recent discussions you have in the chat, summarize it in order to still have the history of conversation to fit in the prompting token limitation.
  2. Long term memory. Memory across multiple conversation from the same user.

Read this for more details on how memory works and the implementation details.

Short and Long Term Memory from Langchain Documentation

Responsible AI on RAG

I believe a good RAG need to also be safe to minimize reputation and legal risk. It needs to

  1. Disclose that you are AI (transparency), see PAIR for reference.
  2. Ensure it only do what it supposed to do, avoid doing other things.
  3. Check whether the input is safe from adversarial prompting, e.g. using Prompt Guard.
  4. Ensure the output align, see what DoorDash do.

Note: There's no features as mentioned above on RAGFlow.

People + AI Guidebook

Conclusion

RAGFlow set the bar, how an RAG should look like. It doesn't have to follow all the necessary features in form of UI. But A good RAG should be:

  1. Easy to do trial and error. Iteratively chunk document, test query.
  2. Have a good and reusable abstractions, such as knowledge base and assistant.
  3. Have options to deploy such as via API or embedded
  4. Contextual RAG, good memory management
  5. Responsible and safe RAG

Notes on RAGFlow

If you considering to use RAGFlow, please do it on your own risk. I only use RAGFlow as example of what a good RAG system looks like. Some drawbacks of RAGFlow I found during the experiment:

  1. The default prompt is subpar, leading to poor initial results. It requires improvement to make it more user-friendly and human-like.
  2. The architecture is complex, meaning the infrastructure cost are relatively high.
  3. There are still bugs, and the error messages could be more helpful.

Check RAGFlow out here. For a more lightweight alternative with fewer features but effective performance, consider this option.

Muhamad Ichlas Juliansyah

Metallurgy Process Engineering & Technology Manager | Leading Metallurgy Team, Technical Competence, PIDs, Project, EPC

2 个月

Insightful

回复

要查看或添加评论,请登录

Rendy Bambang Junior的更多文章

  • Pursuing a Responsible AI Chatbot Interface

    Pursuing a Responsible AI Chatbot Interface

    Chatbots are among the most widespread applications of Generative AI, ranging from general-purpose platforms like…

社区洞察

其他会员也浏览了