What to Expect from a Good RAG System
Chatbot is the most common implementations of LLMs. One of the biggest problems in chatbot is hallucination. The issue gives a rise to a method called RAG (Retrieval Augmented Generation). Simply put, instead of directly response to inquiries, chatbot will query relevant data or document from database to be used as a reference to respond. RAG is one of methods to do "grounding", a term commonly used to ensure the AI is not spitting out non sense, instead returning response based on relevant data in database we define.
So, RAG is increasingly more common. But do you know what to expect in a good RAG system? Recently I stumbled upon an RAG open source called RAGFlow. This open source redefine my standard for what a good RAG system should look like. Let's see what RAGFlow provide as features and why it matters.
Vector Processing and Optimization
During the process of preparing documents for placement in a vector database, "chunking" is required to break the documents into smaller parts, ensuring that the context fits within the token limit of the LLM's prompt. For example, in the case of FAQs, each question and answer should ideally be a separate chunk. More reading on chunking here. RAGFlow offers the option to choose the type of document for chunking, which is a great feature. You can align the document with the type you think is most appropriate. After that, you can directly check the chunking results to see if they are suitable.
After reviewing the chunking, retrieval testing can be performed. This test checks whether the search query returns relevant results based on what has been created. RAGFlow provides a tuning threshold to determine how similar the results need to be. This makes the process more transparent rather than operating as a black box.
Additionally, RAGFlow includes the RAPTOR feature, which enhances the accuracy of the results based on this paper. RAPTOR is particularly valuable because one of the challenges of chunking is that the pieces may not be ideal. Summarizing the chunks improves the accuracy significantly. Check this paper.
Abstraction: Assistant, Knowledge Base, Agent
After adding documents to provide context for the AI, users can create an "Assistant" or chatbot. It retains similar features, allowing customization of the system prompt, model parameter tuning, and even the use of multiple databases, referred to as "knowledgebases." Thus you can reuse, combine, and mix several knowledge bases to be used by different Assistant.
The most remarkable feature is the "Agent." With this feature, prompt chaining can be implemented directly. For instance, if a search is required, the system will perform it. It follows the initial categorization and executes the appropriate plugins according to the flow. However, due to the complexity of the Agent's process, each interaction can be costly as it may invoke the LLM and APIs multiple times. It is important to be mindful of this as the architect and carefully weigh the trade-offs based on the specific use case.
Deployment Modes
Once the Assistant is set up, users have various options for deployment. For embedding, select "Embedded" to generate an iframe. To call it as a backend API, use the API key. For direct web access, click "Preview." An impressive feature is its ability to monitor the number of API calls made with the key, which is highly useful for tracking usage.
领英推荐
Contextual RAG
A good chatbot needs to understand the context. I have not found this on RAGFlow documentation, but memory is a basic requirement for chatbot. There are two types of memory:
Read this for more details on how memory works and the implementation details.
Responsible AI on RAG
I believe a good RAG need to also be safe to minimize reputation and legal risk. It needs to
Note: There's no features as mentioned above on RAGFlow.
Conclusion
RAGFlow set the bar, how an RAG should look like. It doesn't have to follow all the necessary features in form of UI. But A good RAG should be:
Notes on RAGFlow
If you considering to use RAGFlow, please do it on your own risk. I only use RAGFlow as example of what a good RAG system looks like. Some drawbacks of RAGFlow I found during the experiment:
Check RAGFlow out here. For a more lightweight alternative with fewer features but effective performance, consider this option.
Metallurgy Process Engineering & Technology Manager | Leading Metallurgy Team, Technical Competence, PIDs, Project, EPC
2 个月Insightful