RAG-Based Multi-Source Chatbot Using LLM
Chatbot (Source: e-Plus Agency)

RAG-Based Multi-Source Chatbot Using LLM

TRY THE CHATBOT: CLICK HERE

INTRODUCTION:

In general, chatbots are used for information retrieval. Traditional chatbots typically work based on some predefined rules as well as keyword matching. It’s kind of a set of if-else rules predefined by the users. When a user inputs a query, the chatbot searches for specific keywords or patterns in the query to identify the appropriate response. Traditional chatbots rely on a fixed knowledge base or database of predefined responses. The responses are manually inserted into the database by the developer. When a user inserts a query, the chatbot looks for the rules which are suitable based on the question, when it finds the question then it gives the answer which is hardcoded associated with that question. It doesn’t make any paraphrasing or can’t perform any generation. Nowadays, LLM-based chatbots are in the hype. LLM-based chatbots can be of two types.

  1. LLM-Based Chatbots without RAG: Large Language Models (LLM) such as OpenAI, llama is trained with billions of parameters as well as with huge amounts of textual data. Some of these are open-source means that can be used without any payment and some are not. We can use the chatbot for our purpose using the API provided by the respected organization of these LLMs. But here the problem is that when a user asks any question it will directly answer from the data it has been trained on without considering any external knowledge base. It will work just as ChatGPT.

2. LLM-Based Chatbots with RAG: RAG stands for Retrieval-Augmented Generation. It has two main components generation and retrieval. Unlike LLM-based chatbots without the RAG concept, here external data sources such as PDF, text, and database are used as knowledge base along with the trained LLM model. So In this case, when any user asks for a query it first looks for a similar type of text chunk in the external knowledge base which is named as retrieval, these text chunks are used as prompts to the LLM model. Based on the context and user query the LLM model can create a more precise and creative answer which can be referred to as generation. This is not possible with other types of chatbot.

In this project, a multisource chatbot using RAG has been implemented where users can upload various types of documents like pdf, and text as an external knowledge base and ask the chatbot to answer questions referring to the knowledge base. The chatbot utilizes the knowledge base as well as the pre-trained LLM to general more reliable, relative, and organized answers.

WORKING PRINCIPLE :

The working of the RAG-based chatbot can be divided into two main parts. Figure 1 shows the workflow diagram of the RAG-based chatbot.

  1. Information Storage: Varieties of documents can be used as external data sources. When any document is uploaded first using RecursiveCharacterTextSplitter the text documents are split into chunks using overlapping or non-overlapping. Then an embedding model is used to make an embedding vector of these text chunks to capture the semantic meaning. These embedding vectors are then stored in vector datasets using indexing for fast and precise information retrieval.
  2. Information Retrieval: When a user inserts a textual query, using the same embedding model this text is converted into an embedding vector and passed to the vector database for information retrieval. Here the concept similarity search is used. Various techniques such as cosine similarity, L1 distance, or L2 distance are used to retrieve similar contextual text chunks related to the user query. After retrieval, these chunks of documents are passed to the LLM model to generate the final answer.
  3. Answer Generation: This is the final step of the chatbot. After information retrieval, the embedding vectors are passed to the LLM model. This is used as a prompt for the model. It can understand the context for which it needs to generate the answer. Finally using the context, the generative LLM model generates the final output.

Figure 1: High-Level Overview of the RAG-Based Chatbot

BUILDING BLOCKS:

The chatbot is comprised of different components. Each of these components has specific functionality and tasks to be accomplished. Here in this section, we will try to understand the working of these individual elements of the chatbot and how these are integrated to implement the final working chatbot.

  1. VECTOR DATABASE: In general, textual data can’t be directly processed by large language models. To feed this unstructured text we need to generate some numbers out of these. It will be good if we can generate numbers out of these texts as well as capture the semantic meaning of these texts. Here comes word embedding which generates an embedding vector from these texts. Usually, these vectors are highly dimensional and hard to handle using a normal relational database. We can store these in a relational database but when we perform a semantic search to find a similar type of context here linear search will be used which is time-consuming and computationally expensive. To solve this problem vector databases are introduced which use LSH (Locality-sensitive hashing) to put similar types of data based on their semantic meaning into the same bucket as well as index them for fast retrieval. Figure 2 shows the high-level working principle of the vector dataset. While working with LLM models there are different vector databases available like Chroma DB, and Faiss. Here in this project, we have utilized Faiss to create a vector database and index the data locally in the local machine for semantic search and fast information retrieval.
  2. LangChain: LangChain is a framework that is used for the acceleration and simplification of application development using different large language models. It is used to connect the language model with different external data sources. If we consider hub and spoke network connection, here LangChain works as a hub that connects other spokes which might be of different types. Figure 3 shows different use cases of LangChain and how it connects different resources. If we have a company database and we want to use LLM to generate a query based on the user textual input and send the response based on the query we have to connect the LLM to the Database and User Prompt to LLM. This integration can be done using LangChain. There are other frameworks like LangChain such as? LlamaIndex which as also be used to accomplish this task. In this project, we have used the LangChain frame to build connections with different resources. Based on Figure 3 it’s understandable that LangChain is a multipurpose framework and can be used to integrate different sources for the accomplishment of a specific task.
  3. EMBEDDING MODEL: The purpose of an embedding model is to generate vector embedding based on the input chunk provided. There are several pre-trained word embedding models available. We have used sentence-transformers/all-MiniLM-L6-v2 from hugging face as our embedding model. Both for user queries and external sources, the same model is used. Figure 4 shows a high-level workflow of the embedding model starting from raw text to stringing in vector space.
  4. RECURSIVE TEXT SPLITTING: We are familiar with text splitting, which splits the text into a particular portion based on some splitting criteria. But in recursive splitting instead of one single criterion a set of criteria are used as follows: ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text. Here we have used RecursiveCharacterTextSplitter from LangChain with two parameters: chunk size and overlap size.

Figure 2: Workflow of vector database
Figure 3: LangChain Use Cases
Figure 4: High-Level Overview of Word Embedding

FLOW CHART OF THE CHATBOT:

Figure 5 shows the entire workflow diagram of the chatbot starting from uploading the document to generating the response as well as closing that chat.

Figure 5: Flow Chart of the Chatbot



IMPLEMENTATION:

The chatbot has two main interfaces. One is for Document Embedding and another one is to perform the chat that is named as RAG Chatbot. The chatbot is hosted on Streamlit a powerful library of Python to host and build interactive applications. On the left, there is a Navigation Bar to switch from one page to another.

Figure 6: Navigation Bar of the Chatbot

The next one is the Document Embedding Page. This page is used to upload documents and create vector embedding for the selected document. Currently, PDF and text files are allowed to upload which can be extended to any other type of document such as URL, External Database, and many others. A multi-select option has been enabled so that at a time multiple files can be uploaded to generate vector embedding. Here, sentence-transformers/all-MiniLM-L6-v2 is used as the embedding model from HuggingFaceInstructEmbeddings using cuda to accelerate the embedding process.

The user must select the following parameters:

Chunk Size: The number of characters in each chunk while splitting the document using a recursive splitting class.

Chunk Overlap: The number of characters to be overlapped between the adjacent chunks.

Vector Store to merge the knowledge: Should be <NEW> if the file is new, or if the user wants to overwrite the existing knowledge base then select that file from the dropdown menu.

New Vector Store Name: If the file is new or <NEW> is selected then the user needs to give a name to store the file in the vector database.

Figure 7: Document Embedding Interface

The last one is the chatbot. Here tiiuae/falcon-7b-instruct model has been utilized from the hugging face hub using the hugging face API key and LangChain. A user needs to select the following parameters:

Hugging Face Token: This is the API Access Key from the hugging face. It has been disabled. But can be enabled also.

Vector Store: The file to use a knowledge base.

Temperature: How creative the chatbot will be.

Maximum character length: How many characters the output or the answer should contain?

After inserting all these parameters users can launch the chatbot just by clicking on Launch Chatbot.

The chatbot can store the chat story of the user for each session as well as the knowledge sources which means while answering any user query it can store the reference text chunk in the history from which one can understand which text chunks from the knowledge base have been utilized or used by the model to answer the research question.

Figure 8: Chatbot Interface

DEMONSTRATION:

For Demonstration purposes, we considered two different cases: For the first case we used 31 PDFs which are about the agenda of women's rights, empowerment, refugee (women) rights, women empowerment in elections, general equality, and many other agenda related to women in Italy and European Union. We have created several knowledge bases having different chunk sizes and overlap sizes such as 512-32, 200-10, 300-30, and 1024-250 where with a chunk size of 1024 along with 250 overlap chatbot performs the best. The following are some of the results for demonstration purposes which are performed with a temperature value of 0.1:

DEMO-1
DEMO-2

We used two different CVs and merged them into one single document as a knowledge base and asked the chatbot specific questions about two persons. Several Questions are asked from their CV and most of the time chatbot gives correct answers. For some cases when a higher temperature value was given the chatbot was making up the answer with some external resources without solely considering the knowledge base provided to it. For more specific questions such as what is the person's phone number, address, email address, and current education status the chatbot can give accurate information whereas for the cases when the answer is elaborative such as a short introduction or description about something, sometimes it generates more diverse and creative responses, but it results in less coherent or relevant answers. It works better with lower temperatures compared to that of higher temperatures which sometimes give irrelevant information which are not present in the knowledge base. Temperature values from 0.1 to 0.3 give more relative answers to the knowledge base whereas values higher than 0.6 give more creative answers.

DEMO-3
DEMO-4
DEMO-5

APPLICATION:

The chatbot can be used for many real-time applications such as:

?????? I.????????? For document analysis and summarization.

???? II.????????? Information retrieval from the company database using text-based instructions.

?? III.????????? Question-Answer Bot for any company to answer some specific questions (Customer Support)

? IV.????????? URL analysis and information retrieval

??? V.????????? Document comparison by selecting multiple documents at a time, such as CV comparison.

? VI.????????? Content Creation refereeing a knowledge base.

VII.????????? It can also be used as an educational assistant by answering questions from a given book or resource.

CONCLUSION:

RAG-based Chatbots are undoubtedly one of the most important innovations as well as tools that can be utilized by various companies from different domains. Here in our case, we have utilized open source LLMs which are free to use. However, these types of LLM-based chatbots need high computation power to work smoothly and effectively. Paid LLM such as OpenAI, and Google Gemini works better than falcon-7b LLM in most cases. One can utilize the same architecture and workflow using various LLMs and Embedding models to build rag based chatbot and can check which model works based on their specific requirements.

REFERENCES:

1.????? https://betterprogramming.pub/building-your-own-devsecops-knowledge-base-with-openai-langchain-and-llamaindex-b28cda15abb7

2.????? https://towardsdatascience.com/how-to-build-a-local-open-source-llm-chatbot-with-rag-f01f73e2a131

3.????? https://qdrant.tech/articles/what-are-embeddings/

4.????? https://datasciencedojo.com/blog/understanding-langchain/

5.????? https://www.pinecone.io/learn/vector-database/

6.????? https://python.langchain.com/docs/get_started/introduction

7.????? https://github.com/SriLaxmi1993/Document-Genie-using-RAG-Framwork/tree/main

8.????? https://github.com/codebasics/langchain

9.????? https://huggingface.co/tiiuae/falcon-7b

10.?? https://www.analyticsvidhya.com/blog/2024/04/rag-and-streamlit-chatbot-chat-with-documents-using-llm/

11.?? https://huggingface.co/hkunlp/instructor-xl


?

Thank you Semanto Mondal, for sharing insights into RAG-based chatbots! The combination of parametric and non-parametric models, along with the ability to upload various document types, showcases its potential for industry-specific tasks.

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

5 个月

See also performance comparison to build LLM/RAG apps with vector databases, at https://mltblog.com/3weQ2UP

  • 该图片无替代文字
Alberto Moccardi

National PhD in AI@Unina | Data Scientist

5 个月

Great work!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了