Q&A Chatbot with Multi document RAG: LangChain, Openai & Streamlit tutorial

Q&A Chatbot with Multi document RAG: LangChain, Openai & Streamlit tutorial

This tutorial will guide you through the process of creating an interactive document-based question-answering application using Streamlit and several components from the langchain library. Our goal is to build an app that allows users to upload documents and ask questions based on these documents. The app leverages OpenAI's models for text embeddings and retrieval-based question answering.

Full video tutorial - https://www.youtube.com/watch?v=uVxmUzc5TeE

App Overview:

On a basic level, the workflow of the app is remarkably straightforward:

  1. A user submits a text document, poses a question, enters their OpenAI API key, and presses "Submit."
  2. LangChain then takes over, handling the two main inputs. Initially, it breaks down the document into smaller segments, generates embedding vectors for these segments, and saves them in an embedding database (also known as the vector store). Following this, it processes the question provided by the user through the Question Answering chain, allowing the LLM (Language Model) to generate an answer.

Image credits: Sri Laxmi

Setting Up the Environment

Before diving into the code, ensure you have Streamlit and LangChain libraries installed. If not, you can install them using pip:

pip install streamlit langchain        

We import the necessary modules from Streamlit and LangChain. Streamlit is used for creating the web app interface, while LangChain provides tools for text splitting, embeddings, vector storage, and retrieval-based question answering.

pip install streamlit langchainimport streamlit as st

from langchain.llms import OpenAI

from langchain.text_splitter import CharacterTextSplitter

from langchain.embeddings import OpenAIEmbeddings

from langchain.vectorstores import Chroma

from langchain.chains import RetrievalQA
        
def generate_response(uploaded_file, openai_api_key, query_text):
   # Load document if file is uploaded
    if uploaded_file is not None:
        documents = [uploaded_file.read().decode()]
        # Split documents into chunks
        text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
        texts = text_splitter.create_documents(documents)
        # Select embeddings
        embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
        # Create a vectorstore from documents
        db = Chroma.from_documents(texts, embeddings)
        # Create retriever interface
        retriever = db.as_retriever()
        # Create QA chain
        qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=openai_api_key), chain_type='stuff', retriever=retriever)
        return qa.run(query_text)        

This function is the core of our app. It takes an uploaded file, an OpenAI API key, and a query text as inputs. Here's what it does step by step:

1. Load Document: If a file is uploaded, it reads and decodes the document.

2. Split Documents into Chunks: Uses CharacterTextSplitter to divide the document into manageable pieces for processing. This is crucial for handling large texts.

3. Select Embeddings: Initializes OpenAIEmbeddings with the API key to generate embeddings for the text chunks.

4. Create a Vector Store: Utilizes Chroma to store the document's embeddings, facilitating efficient retrieval.

5. Create Retriever Interface: Transforms the vector store into a retriever for fetching relevant document sections based on queries.

6. Create QA Chain: Assembles a question-answering pipeline with RetrievalQA, combining the retriever with OpenAI's language models to generate answers.

7. Run Query: Executes the QA chain on the user's query and returns the response.

Streamlit UI Components

st.set_page_config(page_title='???? Ask the Document App')        
st.title('???? Ask the Document App')        

These lines configure the Streamlit page and set its title.

st.header('About the App')        

This creates a header for the section explaining the app's functionality.

st.write(...)        

Here, we use multiple st. write() to provide detailed instructions and information about the app's workflow.


uploaded_file = st.file_uploader('Upload an article', type='pdf')        

Creates a file uploader for the user to upload documents, restricting file types to PDFs.

query_text = st.text_input('Enter your question:', ...)        

Generates a text input field for users to type their questions.

with st.form('myform', clear_on_submit=True):        

This block creates a form that contains an input for the OpenAI API key and a submit button. The form facilitates the secure handling of the API key and clear submission behavior.

if len(result):        
st.info(response)        

Displays the response from the generate_response function if there's any.

Instructions for OpenAI API Key

Lastly, the script includes instructions for obtaining an OpenAI API key, which is crucial for interacting with OpenAI's models used in the app.

Conclusion

This tutorial walked you through building a document-based question-answering application using Streamlit for the interface and LangChain for building the AI application. By following these steps, you can create an app that processes uploaded documents, allows users to query these documents, and provides insightful answers based on the content.

Harsha Srivatsa

Generative AI Product Manager & Founder @ MentisBoostAI | Ex-Apple, Accenture, Cognizant, Verizon, AT&T | Building Next-Gen AI Solutions to solve Complex Business Challenges

1 年

When I run the code as-is in Replit Streamli template, I get a UTF-8 decode error which makes it not able to read the content of the PDF file. Also needs a Streamlit config file to make the generated URL accessible.

Harsha Srivatsa

Generative AI Product Manager & Founder @ MentisBoostAI | Ex-Apple, Accenture, Cognizant, Verizon, AT&T | Building Next-Gen AI Solutions to solve Complex Business Challenges

1 年

When I run this code in Replit Streamlit template, I get an UTF-8 encoding error. I found the fix for this.,

Brendan Sheridan

Creative Solutions Engineer | Blending Design Thinking & Technical Expertise to Drive Innovation

1 年

I know what I’ll be doing tomorrow ??

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1 年

Impressive initiative with the LLM tutorial! You mentioned building LLM applications. In your experience, what key aspects should beginners prioritize to ensure the effectiveness and ethical use of their LLM applications, especially considering the diverse applications like chatbots and generative AI in the tutorial? Also, how can the LangChain community contribute or collaborate to address challenges that beginners might face in this journey? Your insights could provide valuable guidance to those venturing into this domain.

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

1 年

要查看或添加评论,请登录

Sri Laxmi的更多文章

社区洞察

其他会员也浏览了