登录查看更多内容

How to build a RAG chatbot using Ollama - Serve LLMs locally

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

发布日期: 2024年3月8日

Full tutorial video - https://www.youtube.com/watch?v=kfbTZFAikcE

Welcome to this tutorial, where I will guide you through the process of building a document-based question-answering application using Streamlit for the user interface and various components from the LangChain community library. Our goal is to create an app that allows users to input URLs, pose questions, and receive answers based on the content of the specified documents. The app leverages Ollama, a tool that allows running large language models (LLMs) locally, along with the Mistral 7B open-source model for text embeddings and retrieval-based question answering. Specifically, we'll be using the nomic-embed-text model for generating embeddings, which is a high-performing open embedding model with a large token context window outperforming OpenAI embeddings

Why Ollama?

Ollama is a streamlined tool for running open-source LLMs locally, including Mistral and Llama 2. Ollama bundles model weights, configurations, and datasets into a unified package managed by a Model file.

Ollama supports a variety of LLMs including LLaMA-2, uncensored LLaMA, CodeLLaMA, Falcon, Mistral, Vicuna model, WizardCoder, and Wizard uncensored.

By using Ollama, you will be able to run LLMs locally, which means your data stays safe and secure.

App Overview

On a fundamental level, the workflow of the app is remarkably straightforward:

1. A user submits a list of URLs (one per line) and enters a question.

2. The app processes the input, fetching the content from the provided URLs.

3. The text content is split into manageable chunks.

4. These chunks are converted into embeddings using the nomic-embed-text model and stored in a vector database ( Chroma ).

5. The user's question is processed through a Retrieval-Augmented Generation (RAG) pipeline, which retrieves relevant document sections and generates an answer using the Mistral 7B model.

Download Ollama & run the open-source LLM

First, follow these instructions to set up and run a local Ollama instance:

Download and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)
Download the model you want to use Ollama run <mistral>
Fetch available LLM model via ollama pull <mistral>

Setting Up the Environment

Before diving into the code, ensure you have the necessary libraries installed. If not, you can install them using pip:

pip install streamlit langchain_community

We'll import the required modules from Streamlit and LangChain community libraries. Streamlit is used for creating the web app interface, while LangChain provides tools for text splitting, embeddings, vector storage, and retrieval-based question answering.

import streamlit as st

from langchain_community.document_loaders import WebBaseLoader

from langchain_community.vectorstores import Chroma

from langchain_community import embeddings

from langchain_community.llms import Ollama

from langchain_core.runnables import RunnablePassthrough

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate

from langchain.text_splitter import CharacterTextSplitter

Sri Laxmi 7 个月前

?45-Minute OpenAI 2023 Developer Conference in 2…

Yiman H. 11 个月前

Mind-blowing new OpenAI API Features: OpenAI Developer…

Bertha ?????? Kgokong, MBA 11 个月前

The Main Function: process_input

The process_input function is the core of our app. It takes a list of URLs and a query text as inputs and performs the following steps:


# URL processing
def process_input(urls, question):
    model_local = Ollama(model="mistral")
    
    # Convert string of URLs to list
    urls_list = urls.split("\n")
    docs = [WebBaseLoader(url).load() for url in urls_list]
    docs_list = [item for sublist in docs for item in sublist]
    
    #split the text into chunks
    
    text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=7500, chunk_overlap=100)
    doc_splits = text_splitter.split_documents(docs_list)
    
    #convert text chunks into embeddings and store in vector database

    vectorstore = Chroma.from_documents(
        documents=doc_splits,
        collection_name="rag-chroma",
        embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text'),
    )
    retriever = vectorstore.as_retriever()
    
    #perform the RAG 
    
    after_rag_template = """Answer the question based only on the following context:
    {context}
    Question: {question}
    """
    after_rag_prompt = ChatPromptTemplate.from_template(after_rag_template)
    after_rag_chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | after_rag_prompt
        | model_local
        | StrOutputParser()
    )
    return after_rag_chain.invoke(question)

1. Load Documents: The function starts by converting the input string of URLs into a list. It then uses the WebBaseLoader from LangChain to fetch the content from each URL and load it into a list of documents.

2. Split Documents into Chunks: This CharacterTextSplitter is used to divide the document text into smaller, manageable chunks for efficient processing. This is crucial for handling large texts.

3. Select Embeddings: The OllamaEmbeddings model from the LangChain community is initialized to generate embeddings for the text chunks.

4. Create a Vector Store: The Chroma vector store is used to store the document embeddings, facilitating efficient retrieval.

5. Create Retriever Interface: The vector store is transformed into a retriever, which can fetch relevant document sections based on queries.

6. Perform the RAG: This step sets up the Retrieval-Augmented Generation (RAG) pipeline. First, a template is defined for formatting the context and question. Then, a ChatPromptTemplate is created from this template, and a chain is assembled using the retriever, prompt, Ollama language model, and a string output parser. This chain is responsible for generating the final answer based on the retrieved context and the user's question.

7. Run Query: Finally, the RAG chain is invoked with the user's question, and the generated answer is returned.

Streamlit UI Components

With the core functionality in place, let's move on to the Streamlit user interface components:

st.title("Document Query with Ollama")
st.write("Enter URLs (one per line) and a question to query the documents.")

# Input fields
urls = st.text_area("Enter URLs separated by new lines", height=150)
question = st.text_input("Question")

# Button to process input
if st.button('Query Documents'):
    with st.spinner('Processing...'):
        answer = process_input(urls, question)
        st.text_area("Answer", value=answer, height=300, disabled=True)

1. st.title("Document Query with Ollama"): This line sets the title of the Streamlit app.

2. st.write("Enter URLs (one per line) and a question to query the documents."): This provides instructions to the user on how to use the app.

3. urls = st.text_area("Enter URLs separated by new lines", height=150): A text area is created for the user to input URLs, one per line.

4. question = st.text_input("Question"): A text input field is created for the user to enter their question.

5. if st.button('Query Documents'):: This creates a button labeled "Query Documents". When the user clicks this button, the following code block is executed:

- with st.spinner('Processing...'):: A spinner is displayed to indicate that the app is processing the user's input.

- answer = process_input(urls, question): The process_input the function is called with the user's input (URLs and questions), and the generated answer is stored in the answer variable.

- st.text_area("Answer", value=answer, height=300, disabled=True): A text area is displayed with the generated answer. The height parameter sets the height of the text area, and disabled=True ensures that the user cannot edit the text area.

Conclusion:

In this tutorial, I have walked through all the steps to build a RAG chatbot using Ollama, LangChain, streamlit, and Mistral 7B ( open source llm).

AI & Product Newsletter

2,633 位关注者

Gerald Mull

Technology Evangelist | Cloud Architect | DevOps | Automation | Tech Lead

4 个月

Thank you for the insightful article, Sri Laxmi! I successfully transformed your boilerplate example into a RAG solution to inquire about interesting facts around the PGA tournament. It works seamlessly and has provided some fascinating insights!

1 次回应

Kristen Walker CTE CS

5 个月

I've been working with your example today. I had to modify a few things but got it working on my end. Unfortunately when I query data in a spreadsheet the results are not completely accurate. Do you have any tips about improving accuracy? Thanks so much for this fantastic example!

kumar M

Alternative Medicine Professional

6 个月

got below error while running mistral . Any Suggestions? C:\Users\>ollama run mistral pulling manifest Error: Head "https://dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/e8/e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=66040c77ac1b787c3af820529859349a%!F(MISSING)20240410%!F(MISSING)auto%!F(MISSING)s3%!F(MISSING)aws4_request&X-Amz-Date=20240410T112233Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=e9a4a77a0b4235a017356d1359ccace517cd1e412313337f6e25cef2f954803d": dial tcp: lookup dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com: no such host

Pankaj Kumar

AI & Automations

6 个月

There is one error I found in line embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text') Just remove ollama and it will work i.e. embedding=embeddings.OllamaEmbeddings(model='nomic-embed-text') One can also point to server serving model by passing base_url parameter i.e. embedding=embeddings.OllamaEmbeddings(model='nomic-embed-text', base_url='{model_server_url}') same thing can be done for language model loading using Ollama

1 次回应

Ayrton Maradona

Back-end Developer | Abilitya

6 个月

Hello Sri Laxmi, thank you for sharing your knowledge, I got this tutorial and make little changes to can search with links and without links, push it on https://github.com/ayrtonmsa/python-rag, and put this tutorial on reference.

查看更多评论

要查看或添加评论，请登录

查看全部

How to build a RAG chatbot using Ollama - Serve LLMs locally

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

Download Ollama & run the open-source LLM

Setting Up the Environment

领英推荐

The Main Function: process_input

Streamlit UI Components

AI & Product Newsletter

2,633 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Transforming Question Answering with OpenAI and LangChain: Harnessing the Potential of Retrieval Augmented Generation (RAG)

Intelligent Controller for Your Dumb Functions: OpenAI's New Function Calling Makes Everything Easier

Streamlining Development with OpenAI Assistants—Your Complete Guide

Azure AI - OpenAI

Auto-Generating Git Commit Messages with OpenAI GPT-3

AI Series Part III: Creating a chatbot with OpenAI GPT Model (NextJS)

OpenAI's D-Day: Unveiling Exciting Innovations

Harnessing ChatGPT to Populate Your Entire Website with Fresh Content: Part 2: 02_Get_Titles_Based_on_Trend

Building a Chatbot with Rivet from Ironclad and OpenAI

Building a Chat App with GPT Model and LangChain Library

Download Ollama & run the open-source LLM

Setting Up the Environment

领英推荐

The Main Function: process_input

Streamlit UI Components

AI & Product Newsletter

2,633 位关注者

Advanced Retrieval Augmented Generation (RAG) with Reranking

2024年4月12日

From Chatbots to AI Co-Pilots: Salesforce AI Product Leader talks about future of Generative AI co-pilots for Enterprise

2024年4月4日

Building a Text Summarization App with Open AI, Streamlit and LangChain

2024年3月24日

Build a Powerful RAG Chatbot with Cohere's Command-R

2024年3月17日

Build a Generative AI app with Claude 3 - The powerful LLM

2024年3月16日

Boost growth by picking the best product copy using AI - Just words

2024年3月12日

Build AI agents that work for you using Autogen - Full tutorial

2024年3月9日

How to Get Into Y Combinator: Insider Tips for Nailing the YC Accelerator Application

2024年3月7日

Step-by-step guide on how to build AI agents using CrewAI

2024年3月2日

Step-by-Step Guide to Building AI Agents with AutoGen Studio 2.0 - Real-world use-case

2024年2月29日

社区洞察

其他会员也浏览了

Transforming Question Answering with OpenAI and LangChain: Harnessing the Potential of Retrieval Augmented Generation (RAG)

Intelligent Controller for Your Dumb Functions: OpenAI's New Function Calling Makes Everything Easier

Streamlining Development with OpenAI Assistants—Your Complete Guide

Azure AI - OpenAI

Auto-Generating Git Commit Messages with OpenAI GPT-3

AI Series Part III: Creating a chatbot with OpenAI GPT Model (NextJS)

OpenAI's D-Day: Unveiling Exciting Innovations

Harnessing ChatGPT to Populate Your Entire Website with Fresh Content: Part 2: 02_Get_Titles_Based_on_Trend

Building a Chatbot with Rivet from Ironclad and OpenAI

Building a Chat App with GPT Model and LangChain Library