Build a Lightning Fast RAG Chatbot Powered by Groq's LPU, Ollama & LangChain

Micky M.

Director, Risk Management @ Fidelity Investments | Drive AI, Blockchain & Quantum Products LRC Partnerships and Technology Management Excellence.

发布日期: 2024年3月5日

In this tutorial, we will create a amazingly fast chatbot that leverages the Groq Language Processing Unit (LPU), LangChain, Ollama, ChromaDB and Gradio.

This chatbot is designed to answer questions based on the content of PDF documents, utilizing the power of Retriever-Answer Generator (RAG) architecture and the incredible speed of Groq's LPU.

First things first:

GitHub Repo is here: GitHub Repo

Notebook is here: Colab Notebook

Why This Architecture?

The combination of Groq's LPU, LangChain, and Ollama offers unparalleled performance and flexibility. Groq's LPU provides ultra-fast inference for LLMs, LangChain enables seamless integration and manipulation of language models and data, and Ollama offers an easy access to Language and Embedding models locally.

This setup is ideal for building applications that require high-speed text generation and understanding, especially when dealing with large volumes of data, such as PDF documents.

Getting Started

Before we get into the implementation, ensure you have a Python environment ready. You don't need a GPU for this tutorial; I used a free t4 on colab, but you can very well do this locally on your laptop/macbook -- thats the beauty of using Ollama!

Installation:

First, install the necessary libraries:

!pip install groq langchain langchain-core langchain-groq chromadb pypdf gradio

This installs all required Python packages for our project. Here's a quick rundown of each package's role:

groq: Provides access to Groq's API and LPU functionalities.
langchain and its related packages (langchain-core, langchain-groq): LangChain is a framework for chaining together language models, vector stores, and more, to create complex language applications.
chromadb: A vector database for storing and retrieving embeddings.
pypdf: A utility for reading PDF documents, necessary for extracting text from PDF files.
gradio: A library for quickly creating web UIs for Python applications, which we'll use to build our chat interface.

Installing Ollama

!curl https://ollama.ai/install.sh | sh

This downloads and installs the Ollama, a tool required for managing embeddings with Ollama's embedding services.

After installation, you must start Ollama by running ollama serve in the notebook's terminal. This service is essential for the next steps, particularly for pulling embeddings and working with ChromaDB.

Pulling Ollama Embeddings

!ollama pull nomic-embed-text

Ollama 0.1.26 supports nomic-embed-text.

nomic-embed-text is a new open source, long context embedding model with 8k token context window (using RoPE), strong performance on several benchmarks (surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small on short and long context tasks), and support for running locally (as well as via an API).

Setting Up the Environment and Loading Documents

from langchain_groq import ChatGroq
from langchain_community.document_loaders import TextLoader, PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community import embeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from google.colab import userdata
import os
import time
import textwrap
import gradio as gr

This cell imports all necessary modules from the installed packages. It sets up the foundation for loading documents, splitting text, creating embeddings, and building the chat interface.

Securely store your Groq API key in the notebook's secrets (Google Colab) or your environment variables (local setup). This key is crucial for accessing Groq's LPU.

Orlando Agustealo Johnson 4 个月前

Gen AI - Generating Code using Advanced Large Language…

Nikhil Goel 3 个月前

Build a Powerful RAG Chatbot with Cohere's Command-R

Sri Laxmi 6 个月前

Loading Documents

from langchain_community.document_loaders import PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("data")

the_text = loader.load()

This snippet loads text from PDF documents in the data directory. LangChain's PyPDFDirectoryLoader makes it easy to extract text from multiple PDF files for processing.

Splitting Text into Chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

chunks = text_splitter.split_documents(the_text)

Given the large size of documents, we split the text into manageable chunks. This approach ensures that the embeddings generated are focused and relevant.

Creating a Vector Store with ChromaDB

from langchain_community.vectorstores import Chroma

from langchain_community.embeddings import OllamaEmbeddings

vectorstore = Chroma.from_documents(

    documents=chunks,

    collection_name="ollama_embeds",

    embedding=OllamaEmbeddings(model='nomic-embed-text'),

)

ChromaDB is used to store and manage embeddings of our document chunks generated with the nomic-embed-text model. This vector store will act as our retriever.

Setting Up Groq's LPU for Inference

from langchain_groq import ChatGroq

from google.colab import userdata

groq_api_key = userdata.get('groq_api_key')

llm = ChatGroq(

            groq_api_key=groq_api_key,

            model_name='mixtral-8x7b-32768'

    )

Initialize the Groq model with your API key. This example uses the mixtral-8x7b-32768 model, optimized for the Groq LPU.

Building the RAG Chain

from langchain_core.prompts import ChatPromptTemplate

from langchain.chains import create_retrieval_chain

from langchain_core.runnables import RunnablePassthrough

from langchain_core.output_parsers import StrOutputParser

rag_template = """..."""

rag_prompt = ChatPromptTemplate.from_template(rag_template)

rag_chain = (

    {"context": retriever, "question": RunnablePassthrough()}

    | rag_prompt

    | llm

    | StrOutputParser()

)

This code snippet sets up the RAG architecture, allowing the system to retrieve relevant document chunks based on the user's question, generate a context-aware prompt, and produce a coherent answer using Groq's LPU.

Testing the RAG Architecture

response = rag_chain.invoke("What is this document about")
print(textwrap.fill(response, width=80))

We test the RAG chain with a hardcoded question to ensure everything is working as expected, demonstrating the chatbot's capability to understand and respond based on the document content.

Launching the Gradio Interface

def process_question(user_question):

    # Processing and response time measurement here

iface = gr.Interface(fn=process_question,

                     inputs=gr.Textbox(lines=2, placeholder="Type your question here..."),

                     outputs=gr.Textbox(),

                     title="GROQ

 CHAT",

                     description="Ask any question about your document, and get an answer along with the response time.")

iface.launch()

Finally, we set up a Gradio interface for users to interact with the chatbot. The process_question function handles the input from the user, invokes the RAG chain, and displays the response along with the inference time.

Wrapping Up

You've now created a powerful chatbot capable of understanding and answering questions based on the content of PDF documents.

This tutorial showcases the synergy between Groq's LPU, LangChain, and Ollama, demonstrating how advanced technologies can come together to create innovative solutions.

As you explore further, consider how you can adapt this architecture to suit other data types or applications. The possibilities are vast, and with tools like Groq's LPU, LangChain, and Ollama, you're well-equipped to tackle them!

Osman Bulut

Lecturer in Engineering | Founder and AI Engineer

4 个月

This chatbot does not remember the conversation history, does it?

meme_ f4rmer

Unternehmensinhaber bei Lederer

5 个月

I tried the notebook in VScode on my mac and it says the script runs only on Linux:

Amit Sharma (PhD)

AI Architect @ YASH Technologies

6 个月

I'd like to highlight a couple of crucial points regarding the process of Retrieval-Augmented Generation (RAG): 1. Embedding stands out as a cornerstone in RAG implementation, with Language Model (LLM) usage primarily dedicated to sentence structuring. Therefore, leveraging the most suitable embedding model based on MTEB score could significantly enhance performance and results. 2. For those seeking comprehensive on-premise LLM solutions, Ollama emerges as a robust choice. On the other hand, Groq LPUs, akin to specialized GPUs optimized for LLMs, offer noteworthy advantages. However, cost-conscious users may find leveraging local resources with built-in GPU options a viable alternative to mitigate expenses. These considerations illuminate the nuanced landscape of RAG and LLM utilization, empowering decision-makers to navigate their options effectively.

1 次回应

Piotr Malicki

6 个月

Excited to see how this technology will revolutionize the chatbot landscape! ??

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Build a Lightning Fast RAG Chatbot Powered by Groq's LPU, Ollama & LangChain

Micky M.

Director, Risk Management @ Fidelity Investments | Drive AI, Blockchain & Quantum Products LRC Partnerships and Technology Management Excellence.

First things first:

Why This Architecture?

Getting Started

Installation:

Installing Ollama

Pulling Ollama Embeddings

Setting Up the Environment and Loading Documents

领英推荐

Loading Documents

Splitting Text into Chunks

Creating a Vector Store with ChromaDB

Setting Up Groq's LPU for Inference

Building the RAG Chain

Testing the RAG Architecture

Launching the Gradio Interface

Wrapping Up

更多精彩文章

社区洞察

其他会员也浏览了

Introducing CodeLlama 70B: A 70 billion-parameter model achieving SOTA performance in code generation.

Mastering the Ingestion Phase of Retriever Augmented Generation (RAG)

Creating a Chatbot with Streamlit and OpenAI's GPT-3 Language Model

Interesting Content in AI, Software, Business, and Tech- 5/31/2023

OpenAI - Getting Started

Building a Text Summarization App with Open AI, Streamlit and LangChain

AI Voice Cloning with BARK and HuBERT: A Practical Guide

MOJO-A new approach to AI and ML implementation

Streamlit pros and cons in Gen ai

OpenAI Batch API: What Could Be More Boring and Yet So Useful?

First things first:

Why This Architecture?

Getting Started

Installation:

Installing Ollama

Pulling Ollama Embeddings

Setting Up the Environment and Loading Documents

领英推荐

Loading Documents

Splitting Text into Chunks

Creating a Vector Store with ChromaDB

Setting Up Groq's LPU for Inference

Building the RAG Chain

Testing the RAG Architecture

Launching the Gradio Interface

Wrapping Up

Tool Use(Function Calling) with Anthropic's Claude 3 Opus LLM

2024年3月6日

Efficient Large Language Model Inference with Limited Memory

2023年12月27日

Exploring GPT-4 Vision for Scalable AI Applications

2023年11月9日

Harnessing Large Language Models for Natural Language Queries on JSON Data

2023年10月12日

Tokenizer Architectures for Large Language Models (LLMs): Overview and Examples

2023年10月6日

社区洞察

其他会员也浏览了

Introducing CodeLlama 70B: A 70 billion-parameter model achieving SOTA performance in code generation.

Mastering the Ingestion Phase of Retriever Augmented Generation (RAG)

Creating a Chatbot with Streamlit and OpenAI's GPT-3 Language Model

Interesting Content in AI, Software, Business, and Tech- 5/31/2023

OpenAI - Getting Started

Building a Text Summarization App with Open AI, Streamlit and LangChain

AI Voice Cloning with BARK and HuBERT: A Practical Guide

MOJO-A new approach to AI and ML implementation

Streamlit pros and cons in Gen ai

OpenAI Batch API: What Could Be More Boring and Yet So Useful?