Build a Lightning Fast RAG Chatbot Powered by Groq's LPU, Ollama & LangChain
In this tutorial, we will create a amazingly fast chatbot that leverages the Groq Language Processing Unit (LPU), LangChain, Ollama, ChromaDB and Gradio.
This chatbot is designed to answer questions based on the content of PDF documents, utilizing the power of Retriever-Answer Generator (RAG) architecture and the incredible speed of Groq's LPU.
First things first:
GitHub Repo is here: GitHub Repo
Notebook is here: Colab Notebook
Why This Architecture?
The combination of Groq's LPU, LangChain, and Ollama offers unparalleled performance and flexibility. Groq's LPU provides ultra-fast inference for LLMs, LangChain enables seamless integration and manipulation of language models and data, and Ollama offers an easy access to Language and Embedding models locally.
This setup is ideal for building applications that require high-speed text generation and understanding, especially when dealing with large volumes of data, such as PDF documents.
Getting Started
Before we get into the implementation, ensure you have a Python environment ready. You don't need a GPU for this tutorial; I used a free t4 on colab, but you can very well do this locally on your laptop/macbook -- thats the beauty of using Ollama!
Installation:
First, install the necessary libraries:
!pip install groq langchain langchain-core langchain-groq chromadb pypdf gradio
This installs all required Python packages for our project. Here's a quick rundown of each package's role:
Installing Ollama
!curl https://ollama.ai/install.sh | sh
This downloads and installs the Ollama, a tool required for managing embeddings with Ollama's embedding services.
After installation, you must start Ollama by running ollama serve in the notebook's terminal. This service is essential for the next steps, particularly for pulling embeddings and working with ChromaDB.
Pulling Ollama Embeddings
!ollama pull nomic-embed-text
Ollama 0.1.26 supports nomic-embed-text.
nomic-embed-text is a new open source, long context embedding model with 8k token context window (using RoPE), strong performance on several benchmarks (surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small on short and long context tasks), and support for running locally (as well as via an API).
Setting Up the Environment and Loading Documents
from langchain_groq import ChatGroq
from langchain_community.document_loaders import TextLoader, PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community import embeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from google.colab import userdata
import os
import time
import textwrap
import gradio as gr
This cell imports all necessary modules from the installed packages. It sets up the foundation for loading documents, splitting text, creating embeddings, and building the chat interface.
Securely store your Groq API key in the notebook's secrets (Google Colab) or your environment variables (local setup). This key is crucial for accessing Groq's LPU.
领英推荐
Loading Documents
from langchain_community.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader("data")
the_text = loader.load()
This snippet loads text from PDF documents in the data directory. LangChain's PyPDFDirectoryLoader makes it easy to extract text from multiple PDF files for processing.
Splitting Text into Chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(the_text)
Given the large size of documents, we split the text into manageable chunks. This approach ensures that the embeddings generated are focused and relevant.
Creating a Vector Store with ChromaDB
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
vectorstore = Chroma.from_documents(
documents=chunks,
collection_name="ollama_embeds",
embedding=OllamaEmbeddings(model='nomic-embed-text'),
)
ChromaDB is used to store and manage embeddings of our document chunks generated with the nomic-embed-text model. This vector store will act as our retriever.
Setting Up Groq's LPU for Inference
from langchain_groq import ChatGroq
from google.colab import userdata
groq_api_key = userdata.get('groq_api_key')
llm = ChatGroq(
groq_api_key=groq_api_key,
model_name='mixtral-8x7b-32768'
)
Initialize the Groq model with your API key. This example uses the mixtral-8x7b-32768 model, optimized for the Groq LPU.
Building the RAG Chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
rag_template = """..."""
rag_prompt = ChatPromptTemplate.from_template(rag_template)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt
| llm
| StrOutputParser()
)
This code snippet sets up the RAG architecture, allowing the system to retrieve relevant document chunks based on the user's question, generate a context-aware prompt, and produce a coherent answer using Groq's LPU.
Testing the RAG Architecture
response = rag_chain.invoke("What is this document about")
print(textwrap.fill(response, width=80))
We test the RAG chain with a hardcoded question to ensure everything is working as expected, demonstrating the chatbot's capability to understand and respond based on the document content.
Launching the Gradio Interface
def process_question(user_question):
# Processing and response time measurement here
iface = gr.Interface(fn=process_question,
inputs=gr.Textbox(lines=2, placeholder="Type your question here..."),
outputs=gr.Textbox(),
title="GROQ
CHAT",
description="Ask any question about your document, and get an answer along with the response time.")
iface.launch()
Finally, we set up a Gradio interface for users to interact with the chatbot. The process_question function handles the input from the user, invokes the RAG chain, and displays the response along with the inference time.
Wrapping Up
You've now created a powerful chatbot capable of understanding and answering questions based on the content of PDF documents.
This tutorial showcases the synergy between Groq's LPU, LangChain, and Ollama, demonstrating how advanced technologies can come together to create innovative solutions.
As you explore further, consider how you can adapt this architecture to suit other data types or applications. The possibilities are vast, and with tools like Groq's LPU, LangChain, and Ollama, you're well-equipped to tackle them!
Lecturer in Engineering | Founder and AI Engineer
4 个月This chatbot does not remember the conversation history, does it?
Unternehmensinhaber bei Lederer
5 个月I tried the notebook in VScode on my mac and it says the script runs only on Linux:
AI Architect @ YASH Technologies
6 个月I'd like to highlight a couple of crucial points regarding the process of Retrieval-Augmented Generation (RAG): 1. Embedding stands out as a cornerstone in RAG implementation, with Language Model (LLM) usage primarily dedicated to sentence structuring. Therefore, leveraging the most suitable embedding model based on MTEB score could significantly enhance performance and results. 2. For those seeking comprehensive on-premise LLM solutions, Ollama emerges as a robust choice. On the other hand, Groq LPUs, akin to specialized GPUs optimized for LLMs, offer noteworthy advantages. However, cost-conscious users may find leveraging local resources with built-in GPU options a viable alternative to mitigate expenses. These considerations illuminate the nuanced landscape of RAG and LLM utilization, empowering decision-makers to navigate their options effectively.
NSV Mastermind | Enthusiast AI & ML | Architect AI & ML | Architect Solutions AI & ML | AIOps / MLOps / DataOps Dev | Innovator MLOps & DataOps | NLP Aficionado | Unlocking the Power of AI for a Brighter Future??
6 个月Excited to see how this technology will revolutionize the chatbot landscape! ??