Build a Lightning Fast RAG Chatbot Powered by Groq's LPU, Ollama & LangChain
Groq and Groq LPU Trademarks are owned by Groq, Inc.

Build a Lightning Fast RAG Chatbot Powered by Groq's LPU, Ollama & LangChain

In this tutorial, we will create a amazingly fast chatbot that leverages the Groq Language Processing Unit (LPU), LangChain, Ollama, ChromaDB and Gradio.

This chatbot is designed to answer questions based on the content of PDF documents, utilizing the power of Retriever-Answer Generator (RAG) architecture and the incredible speed of Groq's LPU.

First things first:

GitHub Repo is here: GitHub Repo

Notebook is here: Colab Notebook


Why This Architecture?

The combination of Groq's LPU, LangChain, and Ollama offers unparalleled performance and flexibility. Groq's LPU provides ultra-fast inference for LLMs, LangChain enables seamless integration and manipulation of language models and data, and Ollama offers an easy access to Language and Embedding models locally.

This setup is ideal for building applications that require high-speed text generation and understanding, especially when dealing with large volumes of data, such as PDF documents.

Getting Started

Before we get into the implementation, ensure you have a Python environment ready. You don't need a GPU for this tutorial; I used a free t4 on colab, but you can very well do this locally on your laptop/macbook -- thats the beauty of using Ollama!

Installation:

First, install the necessary libraries:

!pip install groq langchain langchain-core langchain-groq chromadb pypdf gradio        

This installs all required Python packages for our project. Here's a quick rundown of each package's role:

  • groq: Provides access to Groq's API and LPU functionalities.
  • langchain and its related packages (langchain-core, langchain-groq): LangChain is a framework for chaining together language models, vector stores, and more, to create complex language applications.
  • chromadb: A vector database for storing and retrieving embeddings.
  • pypdf: A utility for reading PDF documents, necessary for extracting text from PDF files.
  • gradio: A library for quickly creating web UIs for Python applications, which we'll use to build our chat interface.


Installing Ollama

!curl https://ollama.ai/install.sh | sh        

This downloads and installs the Ollama, a tool required for managing embeddings with Ollama's embedding services.

After installation, you must start Ollama by running ollama serve in the notebook's terminal. This service is essential for the next steps, particularly for pulling embeddings and working with ChromaDB.


Pulling Ollama Embeddings

!ollama pull nomic-embed-text        

Ollama 0.1.26 supports nomic-embed-text.

nomic-embed-text is a new open source, long context embedding model with 8k token context window (using RoPE), strong performance on several benchmarks (surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small on short and long context tasks), and support for running locally (as well as via an API).


Setting Up the Environment and Loading Documents

from langchain_groq import ChatGroq
from langchain_community.document_loaders import TextLoader, PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community import embeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from google.colab import userdata
import os
import time
import textwrap
import gradio as gr        

This cell imports all necessary modules from the installed packages. It sets up the foundation for loading documents, splitting text, creating embeddings, and building the chat interface.

Securely store your Groq API key in the notebook's secrets (Google Colab) or your environment variables (local setup). This key is crucial for accessing Groq's LPU.


Loading Documents

from langchain_community.document_loaders import PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("data")

the_text = loader.load()        

This snippet loads text from PDF documents in the data directory. LangChain's PyPDFDirectoryLoader makes it easy to extract text from multiple PDF files for processing.


Splitting Text into Chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

chunks = text_splitter.split_documents(the_text)        

Given the large size of documents, we split the text into manageable chunks. This approach ensures that the embeddings generated are focused and relevant.


Creating a Vector Store with ChromaDB

from langchain_community.vectorstores import Chroma

from langchain_community.embeddings import OllamaEmbeddings

vectorstore = Chroma.from_documents(

    documents=chunks,

    collection_name="ollama_embeds",

    embedding=OllamaEmbeddings(model='nomic-embed-text'),

)        

ChromaDB is used to store and manage embeddings of our document chunks generated with the nomic-embed-text model. This vector store will act as our retriever.


Setting Up Groq's LPU for Inference

from langchain_groq import ChatGroq

from google.colab import userdata

groq_api_key = userdata.get('groq_api_key')

llm = ChatGroq(

            groq_api_key=groq_api_key,

            model_name='mixtral-8x7b-32768'

    )        

Initialize the Groq model with your API key. This example uses the mixtral-8x7b-32768 model, optimized for the Groq LPU.


Building the RAG Chain

from langchain_core.prompts import ChatPromptTemplate

from langchain.chains import create_retrieval_chain

from langchain_core.runnables import RunnablePassthrough

from langchain_core.output_parsers import StrOutputParser

rag_template = """..."""

rag_prompt = ChatPromptTemplate.from_template(rag_template)

rag_chain = (

    {"context": retriever, "question": RunnablePassthrough()}

    | rag_prompt

    | llm

    | StrOutputParser()

)        

This code snippet sets up the RAG architecture, allowing the system to retrieve relevant document chunks based on the user's question, generate a context-aware prompt, and produce a coherent answer using Groq's LPU.


Testing the RAG Architecture

response = rag_chain.invoke("What is this document about")
print(textwrap.fill(response, width=80))        

We test the RAG chain with a hardcoded question to ensure everything is working as expected, demonstrating the chatbot's capability to understand and respond based on the document content.


Launching the Gradio Interface

def process_question(user_question):

    # Processing and response time measurement here

iface = gr.Interface(fn=process_question,

                     inputs=gr.Textbox(lines=2, placeholder="Type your question here..."),

                     outputs=gr.Textbox(),

                     title="GROQ

 CHAT",

                     description="Ask any question about your document, and get an answer along with the response time.")

iface.launch()        

Finally, we set up a Gradio interface for users to interact with the chatbot. The process_question function handles the input from the user, invokes the RAG chain, and displays the response along with the inference time.


Wrapping Up

You've now created a powerful chatbot capable of understanding and answering questions based on the content of PDF documents.

This tutorial showcases the synergy between Groq's LPU, LangChain, and Ollama, demonstrating how advanced technologies can come together to create innovative solutions.

As you explore further, consider how you can adapt this architecture to suit other data types or applications. The possibilities are vast, and with tools like Groq's LPU, LangChain, and Ollama, you're well-equipped to tackle them!


Osman Bulut

Lecturer in Engineering | Founder and AI Engineer

4 个月

This chatbot does not remember the conversation history, does it?

回复
meme_ f4rmer

Unternehmensinhaber bei Lederer

5 个月

I tried the notebook in VScode on my mac and it says the script runs only on Linux:

  • 该图片无替代文字
回复
Amit Sharma (PhD)

AI Architect @ YASH Technologies

6 个月

I'd like to highlight a couple of crucial points regarding the process of Retrieval-Augmented Generation (RAG): 1. Embedding stands out as a cornerstone in RAG implementation, with Language Model (LLM) usage primarily dedicated to sentence structuring. Therefore, leveraging the most suitable embedding model based on MTEB score could significantly enhance performance and results. 2. For those seeking comprehensive on-premise LLM solutions, Ollama emerges as a robust choice. On the other hand, Groq LPUs, akin to specialized GPUs optimized for LLMs, offer noteworthy advantages. However, cost-conscious users may find leveraging local resources with built-in GPU options a viable alternative to mitigate expenses. These considerations illuminate the nuanced landscape of RAG and LLM utilization, empowering decision-makers to navigate their options effectively.

Piotr Malicki

NSV Mastermind | Enthusiast AI & ML | Architect AI & ML | Architect Solutions AI & ML | AIOps / MLOps / DataOps Dev | Innovator MLOps & DataOps | NLP Aficionado | Unlocking the Power of AI for a Brighter Future??

6 个月

Excited to see how this technology will revolutionize the chatbot landscape! ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了