Building a YouTube AI Q&A Bot with Langchain, Llama, and?Python
Asim Hafeez
Senior Software Engineer | Lead | AI | LLMs | System Design | Blockchain | AWS
Asking questions about specific parts of a YouTube video and getting quick, precise answers can save time and enhance our interaction with video content. In this tutorial, we’ll build a YouTube Q&A Bot that retrieves answers from video transcripts.
We’ll use Langchain for AI-driven queries, Llama 3.1 by Meta for language processing, FAISS as an in-memory vector store, and Streamlit for a simple, interactive interface.
Introduction
Finding specific information within YouTube videos can be time-consuming. The YouTube Q&A Bot solves this by letting users ask questions and receive answers based on the video’s transcript.
Using Langchain for AI-powered retrieval and FAISS for storing the transcript, this guide will show you how to:
In this guide, you’ll learn how to:
? Extract YouTube video transcripts.
? Build a system to answer questions based on video content.
? Create an interactive UI with Streamlit.
Here is the architecture of our application!!!
Let’s get started!
1. Prerequisites
Before we dive into the code, ensure you have these prerequisites:
? Python 3.x
The following Python libraries:
Additionally, since we’re using Llama 3.1 by Meta, you need to install Ollama to run Llama locally on your machine. Follow these steps to set it up:
1. Install Ollama:
? Download and install Ollama from their official website.
2. Run Llama 3.1 locally:
ollama run llama3.1
And then to install these dependencies, run:
pip install langchain langchain-community langchain_core ollama-llm faiss-cpu streamlit youtube-transcript-api python-dotenv
2. Setting Up the Environment
We start by loading environment variables using dotenv. This helps securely manage sensitive data like API keys.
from dotenv import load_dotenv
load_dotenv()
This setup ensures your environment is properly configured, allowing the bot to access any necessary keys or variables from a?.env file.
3. Initializing the Language Model and Embedding Model
We will use Llama 3.1 by Meta, a powerful language model, to process user questions and generate responses based on the video content. Additionally, we’ll use OllamaEmbeddings to convert the video transcript into vectors (embeddings), which will be stored and retrieved when users ask questions.
from langchain_ollama import OllamaLLM, OllamaEmbeddings
llm = OllamaLLM(model="llama3.1")
embedding_model = OllamaEmbeddings(model="llama3.1")
where,
4. Loading and Processing the YouTube Video Transcript
Now that the model is set up, we need to load the YouTube video transcript. We use YoutubeLoader to extract the transcript and RecursiveCharacterTextSplitter to break it into smaller, manageable chunks.
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
def create_vector_store():
youtube_url = "https://www.youtube.com/watch?v=Mu-eK72ioDk&t=258s&ab_channel=CNET"
youtube_loader = YoutubeLoader.from_youtube_url(youtube_url)
video_transcript = youtube_loader.load()
# Split the transcript into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
data_docs = text_splitter.split_documents(video_transcript)
# Create a vector store from the split transcript
store = FAISS.from_documents(data_docs, embedding_model)
return store.as_retriever()
where,
5. Storing the Transcript in a Vector?Store
To allow efficient searching through the transcript, we use FAISS to create a vector store. This allows the bot to quickly find the relevant chunk of the video when a user asks a question.
store = FAISS.from_documents(data_docs, embedding_model)
What’s Happening:
6. Creating the Question-and-Answer Chain
Next, we need to set up a chain that allows the bot to process user questions and retrieve the relevant information from the transcript. The ChatPromptTemplate ensures that the bot answers based on the context (the video transcript) only.
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(
"""
Answer the following question based only on the provided context.
Think step by step before providing a detailed answer.
Just answer the exact question, don't explain.
<context> {context} </context>
Question: {input}"""
)
The ChatPromptTemplate guides the AI in answering questions based solely on the context provided (the video transcript). It prevents the model from hallucinating and ensures that answers are tied to the video content.
Now, we combine the prompt and retrieval chain:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
stuff_documents_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(st.session_state['retrieval'], stuff_documents_chain)
This chain allows us to retrieve relevant chunks of the transcript and use them to answer user queries.
7. Creating the User Interface with Streamlit
To interact with the bot, we need a simple user interface. Streamlit provides an easy way to build a web-based UI where users can type questions and view the bot’s responses.
import streamlit as st
st.title("Chat with Youtube Video")
user_input = st.text_input("You: ")
if 'retrieval' not in st.session_state:
st.session_state['retrieval'] = create_vector_store()
st.session_state['qa_history'] = []
if user_input:
if user_input.lower() == "exit":
st.write("Chat ended.")
else:
st.session_state['qa_history'].append(f"You: {user_input}")
response = retrieval_chain.invoke({'input': user_input})
st.session_state['qa_history'].append(f"Bot: {response['answer']}")
for message in st.session_state['qa_history']:
st.write(message)
Code Explanation:
8. Running the Application
To run the bot, you need to execute the Streamlit app:
streamlit run app.py
Once the app runs, you can open it in your browser, input questions about the YouTube video, and get answers based on the transcript.
App Demo:
We are using the Tesla Robotaxi is Confusing YouTube video for the demo.
9. Conclusion
In this article, we’ve built a YouTube Q&A Bot using Langchain, FAISS, and Streamlit. This bot extracts a YouTube video’s transcript, stores it in a vector store, and allows users to query the video's content by asking questions. The bot retrieves relevant chunks of the transcript and provides accurate answers based on the video's context.
You now have a working YouTube Q&A Bot that makes interacting with video content easier and more intuitive. Feel free to expand on this project and make it your own!
If you found the article helpful, don’t forget to share the knowledge with more people! ??