Q&A Chatbot with Multi document RAG: LangChain, Openai & Streamlit tutorial
This tutorial will guide you through the process of creating an interactive document-based question-answering application using Streamlit and several components from the langchain library. Our goal is to build an app that allows users to upload documents and ask questions based on these documents. The app leverages OpenAI's models for text embeddings and retrieval-based question answering.
Full video tutorial - https://www.youtube.com/watch?v=uVxmUzc5TeE
App Overview:
On a basic level, the workflow of the app is remarkably straightforward:
Setting Up the Environment
Before diving into the code, ensure you have Streamlit and LangChain libraries installed. If not, you can install them using pip:
pip install streamlit langchain
We import the necessary modules from Streamlit and LangChain. Streamlit is used for creating the web app interface, while LangChain provides tools for text splitting, embeddings, vector storage, and retrieval-based question answering.
pip install streamlit langchainimport streamlit as st
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
def generate_response(uploaded_file, openai_api_key, query_text):
# Load document if file is uploaded
if uploaded_file is not None:
documents = [uploaded_file.read().decode()]
# Split documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.create_documents(documents)
# Select embeddings
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
# Create a vectorstore from documents
db = Chroma.from_documents(texts, embeddings)
# Create retriever interface
retriever = db.as_retriever()
# Create QA chain
qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=openai_api_key), chain_type='stuff', retriever=retriever)
return qa.run(query_text)
This function is the core of our app. It takes an uploaded file, an OpenAI API key, and a query text as inputs. Here's what it does step by step:
1. Load Document: If a file is uploaded, it reads and decodes the document.
2. Split Documents into Chunks: Uses CharacterTextSplitter to divide the document into manageable pieces for processing. This is crucial for handling large texts.
3. Select Embeddings: Initializes OpenAIEmbeddings with the API key to generate embeddings for the text chunks.
4. Create a Vector Store: Utilizes Chroma to store the document's embeddings, facilitating efficient retrieval.
5. Create Retriever Interface: Transforms the vector store into a retriever for fetching relevant document sections based on queries.
6. Create QA Chain: Assembles a question-answering pipeline with RetrievalQA, combining the retriever with OpenAI's language models to generate answers.
7. Run Query: Executes the QA chain on the user's query and returns the response.
领英推荐
Streamlit UI Components
st.set_page_config(page_title='???? Ask the Document App')
st.title('???? Ask the Document App')
These lines configure the Streamlit page and set its title.
st.header('About the App')
This creates a header for the section explaining the app's functionality.
st.write(...)
Here, we use multiple st. write() to provide detailed instructions and information about the app's workflow.
uploaded_file = st.file_uploader('Upload an article', type='pdf')
Creates a file uploader for the user to upload documents, restricting file types to PDFs.
query_text = st.text_input('Enter your question:', ...)
Generates a text input field for users to type their questions.
with st.form('myform', clear_on_submit=True):
This block creates a form that contains an input for the OpenAI API key and a submit button. The form facilitates the secure handling of the API key and clear submission behavior.
if len(result):
st.info(response)
Displays the response from the generate_response function if there's any.
Instructions for OpenAI API Key
Lastly, the script includes instructions for obtaining an OpenAI API key, which is crucial for interacting with OpenAI's models used in the app.
Conclusion
This tutorial walked you through building a document-based question-answering application using Streamlit for the interface and LangChain for building the AI application. By following these steps, you can create an app that processes uploaded documents, allows users to query these documents, and provides insightful answers based on the content.
Generative AI Product Manager & Founder @ MentisBoostAI | Ex-Apple, Accenture, Cognizant, Verizon, AT&T | Building Next-Gen AI Solutions to solve Complex Business Challenges
1 年When I run the code as-is in Replit Streamli template, I get a UTF-8 decode error which makes it not able to read the content of the PDF file. Also needs a Streamlit config file to make the generated URL accessible.
Generative AI Product Manager & Founder @ MentisBoostAI | Ex-Apple, Accenture, Cognizant, Verizon, AT&T | Building Next-Gen AI Solutions to solve Complex Business Challenges
1 年When I run this code in Replit Streamlit template, I get an UTF-8 encoding error. I found the fix for this.,
Creative Solutions Engineer | Blending Design Thinking & Technical Expertise to Drive Innovation
1 年I know what I’ll be doing tomorrow ??
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
1 年Impressive initiative with the LLM tutorial! You mentioned building LLM applications. In your experience, what key aspects should beginners prioritize to ensure the effectiveness and ethical use of their LLM applications, especially considering the diverse applications like chatbots and generative AI in the tutorial? Also, how can the LangChain community contribute or collaborate to address challenges that beginners might face in this journey? Your insights could provide valuable guidance to those venturing into this domain.
AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM
1 年Full tutorial - https://www.youtube.com/watch?v=uVxmUzc5TeE