RAG-Based Multi-Source Chatbot Using LLM
Semanto Mondal
MS in Data Science@UniNa | Intern @Apple Academy | Machine Learning | Deep Learning | Ex O&M Engineer @Mango | CCNA | JNCIA
TRY THE CHATBOT: CLICK HERE
INTRODUCTION:
In general, chatbots are used for information retrieval. Traditional chatbots typically work based on some predefined rules as well as keyword matching. It’s kind of a set of if-else rules predefined by the users. When a user inputs a query, the chatbot searches for specific keywords or patterns in the query to identify the appropriate response. Traditional chatbots rely on a fixed knowledge base or database of predefined responses. The responses are manually inserted into the database by the developer. When a user inserts a query, the chatbot looks for the rules which are suitable based on the question, when it finds the question then it gives the answer which is hardcoded associated with that question. It doesn’t make any paraphrasing or can’t perform any generation. Nowadays, LLM-based chatbots are in the hype. LLM-based chatbots can be of two types.
2. LLM-Based Chatbots with RAG: RAG stands for Retrieval-Augmented Generation. It has two main components generation and retrieval. Unlike LLM-based chatbots without the RAG concept, here external data sources such as PDF, text, and database are used as knowledge base along with the trained LLM model. So In this case, when any user asks for a query it first looks for a similar type of text chunk in the external knowledge base which is named as retrieval, these text chunks are used as prompts to the LLM model. Based on the context and user query the LLM model can create a more precise and creative answer which can be referred to as generation. This is not possible with other types of chatbot.
In this project, a multisource chatbot using RAG has been implemented where users can upload various types of documents like pdf, and text as an external knowledge base and ask the chatbot to answer questions referring to the knowledge base. The chatbot utilizes the knowledge base as well as the pre-trained LLM to general more reliable, relative, and organized answers.
WORKING PRINCIPLE :
The working of the RAG-based chatbot can be divided into two main parts. Figure 1 shows the workflow diagram of the RAG-based chatbot.
BUILDING BLOCKS:
The chatbot is comprised of different components. Each of these components has specific functionality and tasks to be accomplished. Here in this section, we will try to understand the working of these individual elements of the chatbot and how these are integrated to implement the final working chatbot.
FLOW CHART OF THE CHATBOT:
Figure 5 shows the entire workflow diagram of the chatbot starting from uploading the document to generating the response as well as closing that chat.
IMPLEMENTATION:
The chatbot has two main interfaces. One is for Document Embedding and another one is to perform the chat that is named as RAG Chatbot. The chatbot is hosted on Streamlit a powerful library of Python to host and build interactive applications. On the left, there is a Navigation Bar to switch from one page to another.
The next one is the Document Embedding Page. This page is used to upload documents and create vector embedding for the selected document. Currently, PDF and text files are allowed to upload which can be extended to any other type of document such as URL, External Database, and many others. A multi-select option has been enabled so that at a time multiple files can be uploaded to generate vector embedding. Here, sentence-transformers/all-MiniLM-L6-v2 is used as the embedding model from HuggingFaceInstructEmbeddings using cuda to accelerate the embedding process.
The user must select the following parameters:
Chunk Size: The number of characters in each chunk while splitting the document using a recursive splitting class.
Chunk Overlap: The number of characters to be overlapped between the adjacent chunks.
Vector Store to merge the knowledge: Should be <NEW> if the file is new, or if the user wants to overwrite the existing knowledge base then select that file from the dropdown menu.
New Vector Store Name: If the file is new or <NEW> is selected then the user needs to give a name to store the file in the vector database.
The last one is the chatbot. Here tiiuae/falcon-7b-instruct model has been utilized from the hugging face hub using the hugging face API key and LangChain. A user needs to select the following parameters:
Hugging Face Token: This is the API Access Key from the hugging face. It has been disabled. But can be enabled also.
Vector Store: The file to use a knowledge base.
Temperature: How creative the chatbot will be.
Maximum character length: How many characters the output or the answer should contain?
领英推荐
After inserting all these parameters users can launch the chatbot just by clicking on Launch Chatbot.
The chatbot can store the chat story of the user for each session as well as the knowledge sources which means while answering any user query it can store the reference text chunk in the history from which one can understand which text chunks from the knowledge base have been utilized or used by the model to answer the research question.
DEMONSTRATION:
For Demonstration purposes, we considered two different cases: For the first case we used 31 PDFs which are about the agenda of women's rights, empowerment, refugee (women) rights, women empowerment in elections, general equality, and many other agenda related to women in Italy and European Union. We have created several knowledge bases having different chunk sizes and overlap sizes such as 512-32, 200-10, 300-30, and 1024-250 where with a chunk size of 1024 along with 250 overlap chatbot performs the best. The following are some of the results for demonstration purposes which are performed with a temperature value of 0.1:
We used two different CVs and merged them into one single document as a knowledge base and asked the chatbot specific questions about two persons. Several Questions are asked from their CV and most of the time chatbot gives correct answers. For some cases when a higher temperature value was given the chatbot was making up the answer with some external resources without solely considering the knowledge base provided to it. For more specific questions such as what is the person's phone number, address, email address, and current education status the chatbot can give accurate information whereas for the cases when the answer is elaborative such as a short introduction or description about something, sometimes it generates more diverse and creative responses, but it results in less coherent or relevant answers. It works better with lower temperatures compared to that of higher temperatures which sometimes give irrelevant information which are not present in the knowledge base. Temperature values from 0.1 to 0.3 give more relative answers to the knowledge base whereas values higher than 0.6 give more creative answers.
APPLICATION:
The chatbot can be used for many real-time applications such as:
?????? I.????????? For document analysis and summarization.
???? II.????????? Information retrieval from the company database using text-based instructions.
?? III.????????? Question-Answer Bot for any company to answer some specific questions (Customer Support)
? IV.????????? URL analysis and information retrieval
??? V.????????? Document comparison by selecting multiple documents at a time, such as CV comparison.
? VI.????????? Content Creation refereeing a knowledge base.
VII.????????? It can also be used as an educational assistant by answering questions from a given book or resource.
CONCLUSION:
RAG-based Chatbots are undoubtedly one of the most important innovations as well as tools that can be utilized by various companies from different domains. Here in our case, we have utilized open source LLMs which are free to use. However, these types of LLM-based chatbots need high computation power to work smoothly and effectively. Paid LLM such as OpenAI, and Google Gemini works better than falcon-7b LLM in most cases. One can utilize the same architecture and workflow using various LLMs and Embedding models to build rag based chatbot and can check which model works based on their specific requirements.
REFERENCES:
2.????? https://towardsdatascience.com/how-to-build-a-local-open-source-llm-chatbot-with-rag-f01f73e2a131
?
Thank you Semanto Mondal, for sharing insights into RAG-based chatbots! The combination of parametric and non-parametric models, along with the ability to upload various document types, showcases its potential for industry-specific tasks.
AI/LLM Disruptive Leader | GenAI Tech Lab
5 个月See also performance comparison to build LLM/RAG apps with vector databases, at https://mltblog.com/3weQ2UP
National PhD in AI@Unina | Data Scientist
5 个月Great work!