登录查看更多内容

Q&A Chatbot with Multi document RAG: LangChain, Openai & Streamlit tutorial

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

发布日期: 2024年2月25日

This tutorial will guide you through the process of creating an interactive document-based question-answering application using Streamlit and several components from the langchain library. Our goal is to build an app that allows users to upload documents and ask questions based on these documents. The app leverages OpenAI's models for text embeddings and retrieval-based question answering.

Full video tutorial - https://www.youtube.com/watch?v=uVxmUzc5TeE

App Overview:

On a basic level, the workflow of the app is remarkably straightforward:

A user submits a text document, poses a question, enters their OpenAI API key, and presses "Submit."
LangChain then takes over, handling the two main inputs. Initially, it breaks down the document into smaller segments, generates embedding vectors for these segments, and saves them in an embedding database (also known as the vector store). Following this, it processes the question provided by the user through the Question Answering chain, allowing the LLM (Language Model) to generate an answer.

Setting Up the Environment

Before diving into the code, ensure you have Streamlit and LangChain libraries installed. If not, you can install them using pip:

pip install streamlit langchain

We import the necessary modules from Streamlit and LangChain. Streamlit is used for creating the web app interface, while LangChain provides tools for text splitting, embeddings, vector storage, and retrieval-based question answering.

pip install streamlit langchainimport streamlit as st

from langchain.llms import OpenAI

from langchain.text_splitter import CharacterTextSplitter

from langchain.embeddings import OpenAIEmbeddings

from langchain.vectorstores import Chroma

from langchain.chains import RetrievalQA

def generate_response(uploaded_file, openai_api_key, query_text):
   # Load document if file is uploaded
    if uploaded_file is not None:
        documents = [uploaded_file.read().decode()]
        # Split documents into chunks
        text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
        texts = text_splitter.create_documents(documents)
        # Select embeddings
        embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
        # Create a vectorstore from documents
        db = Chroma.from_documents(texts, embeddings)
        # Create retriever interface
        retriever = db.as_retriever()
        # Create QA chain
        qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=openai_api_key), chain_type='stuff', retriever=retriever)
        return qa.run(query_text)

This function is the core of our app. It takes an uploaded file, an OpenAI API key, and a query text as inputs. Here's what it does step by step:

1. Load Document: If a file is uploaded, it reads and decodes the document.

2. Split Documents into Chunks: Uses CharacterTextSplitter to divide the document into manageable pieces for processing. This is crucial for handling large texts.

3. Select Embeddings: Initializes OpenAIEmbeddings with the API key to generate embeddings for the text chunks.

4. Create a Vector Store: Utilizes Chroma to store the document's embeddings, facilitating efficient retrieval.

5. Create Retriever Interface: Transforms the vector store into a retriever for fetching relevant document sections based on queries.

6. Create QA Chain: Assembles a question-answering pipeline with RetrievalQA, combining the retriever with OpenAI's language models to generate answers.

7. Run Query: Executes the QA chain on the user's query and returns the response.

领英推荐

A Comprehensive Guide to Azure OpenAI Service

Aritra Ghosh 1 年前

OpenAI's New Built-in Tools: Towards Agentic Systems

Miriam Dahmoun, PMP?, AZ-AI?, 1 周前

? The OpenAI Agentic SDK Explained Simply

Sankara Reddy Thamma 2 周前

Streamlit UI Components

st.set_page_config(page_title='???? Ask the Document App')

st.title('???? Ask the Document App')

These lines configure the Streamlit page and set its title.

st.header('About the App')

This creates a header for the section explaining the app's functionality.

st.write(...)

Here, we use multiple st. write() to provide detailed instructions and information about the app's workflow.

uploaded_file = st.file_uploader('Upload an article', type='pdf')

Creates a file uploader for the user to upload documents, restricting file types to PDFs.

query_text = st.text_input('Enter your question:', ...)

Generates a text input field for users to type their questions.

with st.form('myform', clear_on_submit=True):

This block creates a form that contains an input for the OpenAI API key and a submit button. The form facilitates the secure handling of the API key and clear submission behavior.

if len(result):

st.info(response)

Displays the response from the generate_response function if there's any.

Instructions for OpenAI API Key

Lastly, the script includes instructions for obtaining an OpenAI API key, which is crucial for interacting with OpenAI's models used in the app.

Conclusion

This tutorial walked you through building a document-based question-answering application using Streamlit for the interface and LangChain for building the AI application. By following these steps, you can create an app that processes uploaded documents, allows users to query these documents, and provides insightful answers based on the content.

AI & Product Newsletter

2,671 位关注者

Harsha Srivatsa

Generative AI Product Manager & Founder @ MentisBoostAI | Ex-Apple, Accenture, Cognizant, Verizon, AT&T | Building Next-Gen AI Solutions to solve Complex Business Challenges

1 年

When I run the code as-is in Replit Streamli template, I get a UTF-8 decode error which makes it not able to read the content of the PDF file. Also needs a Streamlit config file to make the generated URL accessible.

3 次回应

Harsha Srivatsa

Generative AI Product Manager & Founder @ MentisBoostAI | Ex-Apple, Accenture, Cognizant, Verizon, AT&T | Building Next-Gen AI Solutions to solve Complex Business Challenges

1 年

When I run this code in Replit Streamlit template, I get an UTF-8 encoding error. I found the fix for this.,

2 次回应

Brendan Sheridan

Creative Solutions Engineer | Blending Design Thinking & Technical Expertise to Drive Innovation

1 年

I know what I’ll be doing tomorrow ??

2 次回应

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1 年

Impressive initiative with the LLM tutorial! You mentioned building LLM applications. In your experience, what key aspects should beginners prioritize to ensure the effectiveness and ethical use of their LLM applications, especially considering the diverse applications like chatbots and generative AI in the tutorial? Also, how can the LangChain community contribute or collaborate to address challenges that beginners might face in this journey? Your insights could provide valuable guidance to those venturing into this domain.

2 次回应

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

1 年

Full tutorial - https://www.youtube.com/watch?v=uVxmUzc5TeE

1 次回应

查看更多评论

要查看或添加评论，请登录

Sri Laxmi的更多文章

Advanced Retrieval Augmented Generation (RAG) with Reranking

2024年4月12日

Advanced Retrieval Augmented Generation (RAG) with Reranking

Full tutorial - https://www.youtube.

2 条评论
From Chatbots to AI Co-Pilots: Salesforce AI Product Leader talks about future of Generative AI co-pilots for Enterprise

2024年4月4日

From Chatbots to AI Co-Pilots: Salesforce AI Product Leader talks about future of Generative AI co-pilots for Enterprise

Welcome to AI Product Builders, a weekly podcast that brings together the brightest minds in the AI industry. Each…

7 条评论
Building a Text Summarization App with Open AI, Streamlit and LangChain

2024年3月24日

Building a Text Summarization App with Open AI, Streamlit and LangChain

Full tutorial - https://www.youtube.

7 条评论
Build a Powerful RAG Chatbot with Cohere's Command-R

2024年3月17日

Build a Powerful RAG Chatbot with Cohere's Command-R

Full tutorial - https://www.youtube.

1 条评论
Build a Generative AI app with Claude 3 - The powerful LLM

2024年3月16日

Build a Generative AI app with Claude 3 - The powerful LLM

Full tutorial - https://www.youtube.

1 条评论
Boost growth by picking the best product copy using AI - Just words

2024年3月12日

Boost growth by picking the best product copy using AI - Just words

Welcome to AI Product Builders, a weekly podcast that brings together the brightest minds in the AI industry. Each…

1 条评论
Build AI agents that work for you using Autogen - Full tutorial

2024年3月9日

Build AI agents that work for you using Autogen - Full tutorial

Full tutorial - https://www.youtube.

12 条评论
How to build a RAG chatbot using Ollama - Serve LLMs locally

2024年3月8日

How to build a RAG chatbot using Ollama - Serve LLMs locally

Full tutorial video - https://www.youtube.

19 条评论
How to Get Into Y Combinator: Insider Tips for Nailing the YC Accelerator Application

2024年3月7日

How to Get Into Y Combinator: Insider Tips for Nailing the YC Accelerator Application

I spoke to 10+ YC founders on how to apply and succeed in YC application. Here is what they say! Full video -…

3 条评论
Step-by-step guide on how to build AI agents using CrewAI

2024年3月2日

Step-by-step guide on how to build AI agents using CrewAI

Full tutorial - https://www.youtube.

2 条评论

See all articles

Q&A Chatbot with Multi document RAG: LangChain, Openai & Streamlit tutorial

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

App Overview:

Setting Up the Environment

领英推荐

Streamlit UI Components

Instructions for OpenAI API Key

Conclusion

AI & Product Newsletter

2,671 位关注者

Sri Laxmi的更多文章

社区洞察

其他会员也浏览了

OpenAI’s New Agent Framework: A Seamless Multi-Agent Orchestration with MCP

Unlocking the Power of LLMs: A Deep Dive into Streamlit, Azure OpenAI, and LangChain

Azure OpenAI Capabilities and Best Practices

Unlocking the Power of OpenAI APIs: Benefits with Seamless Integration and Innovation

How to Become an AI Thought Leader with CrewAI and $0.50 in OpenAI Credits

OpenAI Dev Day 2024: OpenAI's Four Pillars of Innovation

Access the Power of OpenAI in Your Code in minutes with ChatMotor

Introducing OpenAI o1

Getting Started with LangChain.js: Calling OpenAI to Tell a Joke

Unlocking New Possibilities with OpenAI's Latest Innovations: Developer Day Highlights

App Overview:

Setting Up the Environment

领英推荐

Streamlit UI Components

Instructions for OpenAI API Key

Conclusion

AI & Product Newsletter

2,671 位关注者

Sri Laxmi的更多文章

Advanced Retrieval Augmented Generation (RAG) with Reranking

From Chatbots to AI Co-Pilots: Salesforce AI Product Leader talks about future of Generative AI co-pilots for Enterprise

Building a Text Summarization App with Open AI, Streamlit and LangChain

Build a Powerful RAG Chatbot with Cohere's Command-R

Build a Generative AI app with Claude 3 - The powerful LLM

Boost growth by picking the best product copy using AI - Just words

Build AI agents that work for you using Autogen - Full tutorial

How to build a RAG chatbot using Ollama - Serve LLMs locally

How to Get Into Y Combinator: Insider Tips for Nailing the YC Accelerator Application

Step-by-step guide on how to build AI agents using CrewAI

社区洞察

其他会员也浏览了

OpenAI’s New Agent Framework: A Seamless Multi-Agent Orchestration with MCP

Unlocking the Power of LLMs: A Deep Dive into Streamlit, Azure OpenAI, and LangChain

Azure OpenAI Capabilities and Best Practices

Unlocking the Power of OpenAI APIs: Benefits with Seamless Integration and Innovation

How to Become an AI Thought Leader with CrewAI and $0.50 in OpenAI Credits

OpenAI Dev Day 2024: OpenAI's Four Pillars of Innovation

Access the Power of OpenAI in Your Code in minutes with ChatMotor

Introducing OpenAI o1

Getting Started with LangChain.js: Calling OpenAI to Tell a Joke

Unlocking New Possibilities with OpenAI's Latest Innovations: Developer Day Highlights