How to build a RAG chatbot using Ollama - Serve LLMs locally
Full tutorial video - https://www.youtube.com/watch?v=kfbTZFAikcE
Welcome to this tutorial, where I will guide you through the process of building a document-based question-answering application using Streamlit for the user interface and various components from the LangChain community library. Our goal is to create an app that allows users to input URLs, pose questions, and receive answers based on the content of the specified documents. The app leverages Ollama, a tool that allows running large language models (LLMs) locally, along with the Mistral 7B open-source model for text embeddings and retrieval-based question answering. Specifically, we'll be using the nomic-embed-text model for generating embeddings, which is a high-performing open embedding model with a large token context window outperforming OpenAI embeddings
Why Ollama?
Ollama is a streamlined tool for running open-source LLMs locally, including Mistral and Llama 2. Ollama bundles model weights, configurations, and datasets into a unified package managed by a Model file.
Ollama supports a variety of LLMs including LLaMA-2, uncensored LLaMA, CodeLLaMA, Falcon, Mistral, Vicuna model, WizardCoder, and Wizard uncensored.
By using Ollama, you will be able to run LLMs locally, which means your data stays safe and secure.
App Overview
On a fundamental level, the workflow of the app is remarkably straightforward:
1. A user submits a list of URLs (one per line) and enters a question.
2. The app processes the input, fetching the content from the provided URLs.
3. The text content is split into manageable chunks.
4. These chunks are converted into embeddings using the nomic-embed-text model and stored in a vector database ( Chroma ).
5. The user's question is processed through a Retrieval-Augmented Generation (RAG) pipeline, which retrieves relevant document sections and generates an answer using the Mistral 7B model.
Download Ollama & run the open-source LLM
First, follow these instructions to set up and run a local Ollama instance:
Setting Up the Environment
Before diving into the code, ensure you have the necessary libraries installed. If not, you can install them using pip:
pip install streamlit langchain_community
We'll import the required modules from Streamlit and LangChain community libraries. Streamlit is used for creating the web app interface, while LangChain provides tools for text splitting, embeddings, vector storage, and retrieval-based question answering.
import streamlit as st
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community import embeddings
from langchain_community.llms import Ollama
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain.text_splitter import CharacterTextSplitter
领英推荐
The Main Function: process_input
The process_input function is the core of our app. It takes a list of URLs and a query text as inputs and performs the following steps:
# URL processing
def process_input(urls, question):
model_local = Ollama(model="mistral")
# Convert string of URLs to list
urls_list = urls.split("\n")
docs = [WebBaseLoader(url).load() for url in urls_list]
docs_list = [item for sublist in docs for item in sublist]
#split the text into chunks
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=7500, chunk_overlap=100)
doc_splits = text_splitter.split_documents(docs_list)
#convert text chunks into embeddings and store in vector database
vectorstore = Chroma.from_documents(
documents=doc_splits,
collection_name="rag-chroma",
embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text'),
)
retriever = vectorstore.as_retriever()
#perform the RAG
after_rag_template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
after_rag_prompt = ChatPromptTemplate.from_template(after_rag_template)
after_rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| after_rag_prompt
| model_local
| StrOutputParser()
)
return after_rag_chain.invoke(question)
1. Load Documents: The function starts by converting the input string of URLs into a list. It then uses the WebBaseLoader from LangChain to fetch the content from each URL and load it into a list of documents.
2. Split Documents into Chunks: This CharacterTextSplitter is used to divide the document text into smaller, manageable chunks for efficient processing. This is crucial for handling large texts.
3. Select Embeddings: The OllamaEmbeddings model from the LangChain community is initialized to generate embeddings for the text chunks.
4. Create a Vector Store: The Chroma vector store is used to store the document embeddings, facilitating efficient retrieval.
5. Create Retriever Interface: The vector store is transformed into a retriever, which can fetch relevant document sections based on queries.
6. Perform the RAG: This step sets up the Retrieval-Augmented Generation (RAG) pipeline. First, a template is defined for formatting the context and question. Then, a ChatPromptTemplate is created from this template, and a chain is assembled using the retriever, prompt, Ollama language model, and a string output parser. This chain is responsible for generating the final answer based on the retrieved context and the user's question.
7. Run Query: Finally, the RAG chain is invoked with the user's question, and the generated answer is returned.
Streamlit UI Components
With the core functionality in place, let's move on to the Streamlit user interface components:
st.title("Document Query with Ollama")
st.write("Enter URLs (one per line) and a question to query the documents.")
# Input fields
urls = st.text_area("Enter URLs separated by new lines", height=150)
question = st.text_input("Question")
# Button to process input
if st.button('Query Documents'):
with st.spinner('Processing...'):
answer = process_input(urls, question)
st.text_area("Answer", value=answer, height=300, disabled=True)
1. st.title("Document Query with Ollama"): This line sets the title of the Streamlit app.
2. st.write("Enter URLs (one per line) and a question to query the documents."): This provides instructions to the user on how to use the app.
3. urls = st.text_area("Enter URLs separated by new lines", height=150): A text area is created for the user to input URLs, one per line.
4. question = st.text_input("Question"): A text input field is created for the user to enter their question.
5. if st.button('Query Documents'):: This creates a button labeled "Query Documents". When the user clicks this button, the following code block is executed:
- with st.spinner('Processing...'):: A spinner is displayed to indicate that the app is processing the user's input.
- answer = process_input(urls, question): The process_input the function is called with the user's input (URLs and questions), and the generated answer is stored in the answer variable.
- st.text_area("Answer", value=answer, height=300, disabled=True): A text area is displayed with the generated answer. The height parameter sets the height of the text area, and disabled=True ensures that the user cannot edit the text area.
Conclusion:
In this tutorial, I have walked through all the steps to build a RAG chatbot using Ollama, LangChain, streamlit, and Mistral 7B ( open source llm).
Technology Evangelist | Cloud Architect | DevOps | Automation | Tech Lead
3 个月Thank you for the insightful article, Sri Laxmi! I successfully transformed your boilerplate example into a RAG solution to inquire about interesting facts around the PGA tournament. It works seamlessly and has provided some fascinating insights!
I've been working with your example today. I had to modify a few things but got it working on my end. Unfortunately when I query data in a spreadsheet the results are not completely accurate. Do you have any tips about improving accuracy? Thanks so much for this fantastic example!
Alternative Medicine Professional
5 个月got below error while running mistral . Any Suggestions? C:\Users\>ollama run mistral pulling manifest Error: Head "https://dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/e8/e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=66040c77ac1b787c3af820529859349a%!F(MISSING)20240410%!F(MISSING)auto%!F(MISSING)s3%!F(MISSING)aws4_request&X-Amz-Date=20240410T112233Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=e9a4a77a0b4235a017356d1359ccace517cd1e412313337f6e25cef2f954803d": dial tcp: lookup dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com: no such host
AI & Automations
5 个月There is one error I found in line embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text') Just remove ollama and it will work i.e. embedding=embeddings.OllamaEmbeddings(model='nomic-embed-text') One can also point to server serving model by passing base_url parameter i.e. embedding=embeddings.OllamaEmbeddings(model='nomic-embed-text', base_url='{model_server_url}') same thing can be done for language model loading using Ollama
Back-end Developer | Abilitya
5 个月Hello Sri Laxmi, thank you for sharing your knowledge, I got this tutorial and make little changes to can search with links and without links, push it on https://github.com/ayrtonmsa/python-rag, and put this tutorial on reference.