RAG System with Video
Kiruthika Subramani
Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA
Hello Everyone,It’s Friday, and guess who’s back? Hope you all had a fantastic week! This week, let’s dive into building a RAG system with a YouTube video.
No more procrastination this time, I promise! I’ll bring it to your devices on time.
If the article is delivered on time, don’t forget to leave a tip and a 5-star rating. I mean, likes and comments!Back in India, I used to hear this whenever I ordered food.
Now, I’m hungry just thinking about it. Let me finish this and grab my breakfast!We all love YouTube, right? Swiping through videos, each one not more than a minute long. And the most common phrase we hear is, “Like, share, subscribe!” I heard someone got a BMW with their YouTube income. Impressive! I spend 15 minutes every day contributing to someone’s income. Proud of myself!
I watch countless cooking videos, but I still end up making curd rice every day.I love that. Even in the freezing winter, haha.
I’ve noticed something, we’re not watching YouTube videos, just the shorts.The moral of the story is, when you feel sleepy, sleep. Don’t strain yourself swiping for leisure entertainment. Not more than 15 minutes a day!
We have tons of resources on YouTube. How about building a RAG that takes the video, extracts the transcript, and answers my questions?
Let’s say some YouTubers say, “Watch until the end to know the truth.” Just take the video, don’t waste your time. Give it to this RAG, and it will give you the answer instantly.Come on, let’s start building it!
No youtubers income is harmed here, disclaimer
Officially, welcome to the second episode of AI Weekly with Krithi!
Example of RAG with a YouTube Video
We download the transcription of a YouTube video and use an LLM for extracting information from that video. This is what we are going to do
Install the Dependencies
!pip3 install langchain
!pip3 install langchain_pinecone
!pip3 install langchain[docarray]
!pip3 install docarray
!pip3 install pypdf
!pip3 install youtube_transcript_api
Why we need to install these?
Download an Example Transcript from a YouTube Video
You can change the ID of the video to download other video transcriptions. We save the content to a file.
from youtube_transcript_api import YouTubeTranscriptApi
srt = YouTubeTranscriptApi.get_transcript("SWm86rBsECw") # CHANGE THE ID OF THE VIDEO
with open("./files/youtube_transcription.txt", "a") as file:
for i in srt:
file.write(i['text'])
This code gets the transcript from a YouTube video with the ID "SWm86rBsECw" and saves it as a text file named "youtube_transcription.txt".
What’s the video about?
Let’s ask this to our RAG, but if you are very interested, here’s a hint - It’s a birthday party of Mr. Bean, celebrated alone. It evokes different emotions based on our mindset.
Don’t you believe me? If you watch it from an audience perspective, you will laugh. If you imagine yourself as Mr. Bean in the situation, it hurts.
Beloved Birthday Wishes from Kiruthika to whoever is reading this article.
I heard your mind’s voice. Today is not my birthday. I forgot to add these terms - belated or advance, whichever applies to you.
But the wishes from my heart are heartfelt.
And the next part is
Select the LLM model to use
The model must be downloaded locally to be used, so if you want to run llama3, you should run:
ollama pull llama3
Check the list of models available for Ollama here: https://ollama.com/library
You need to choose which LLM you want to use. Ollama offers a variety of models, including Llama 3, Phi 3, Mistral, Gemma 2, and more. Once you’ve selected the model, you need to download it locally to use it.
We instantiate the model and the embeddings
It means we create and initialize the model and its numerical representations of data (embeddings) for use in computations.
#MODEL = "gpt-3.5-turbo"
#MODEL = "mixtral:8x7b"
#MODEL = "gemma:7b"
#MODEL = "llama2"
MODEL = "llama3" # https://ollama.com/library/llama3
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings
model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)
This code sets up and initializes the Llama 3 model (or whichever the model you choose from the comment line) and its embeddings for use. It imports the necessary classes from then langchain_community library and creates instances of the model and embedding.
Now, let us load the transcription previously saved using TextLoader
from langchain_community.document_loaders import TextLoader
loader = TextLoader("./files/youtube_transcription.txt")
text_documents = loader.load()
text_documents
Let us make the the document into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_documents = text_splitter.split_documents(text_documents)[:5]
text_documents
This code imports the RecursiveCharacterTextSplitter class from then langchain.text_splitter module and uses it to split a text into smaller chunks. It then takes the first five chunks from the split text documents.
Why first 5 chunks?
The choice to take the first five chunks from the split text documents is likely for demonstration or testing purposes. By selecting a manageable number of chunks, the code can easily showcase how the text is split without overwhelming the user with too much information. This approach helps in verifying that the text splitting process works as intended and allows for quick inspection of the results.
Store the PDF in a vector space.
DocArrayInMemorySearch is a tool that stores documents in your computer’s memory, making it easy to quickly search through small sets of documents without needing a full database. It’s great for simple, small-scale projects where you want fast and straightforward document searching.
from langchain_community.vectorstores import DocArrayInMemorySearch
vectorstore = DocArrayInMemorySearch.from_documents(text_documents, embedding=embeddings)
retriever = vectorstore.as_retriever()
We instantiate the parser
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()
But why?
StrOutputParser is likely used to process and parse output data into a string format. This can be useful when you need to convert various types of output into a consistent string format for further processing or display.
Generate the conversation template
from langchain.prompts import PromptTemplate
template = """
Answer the question based on the context below. If you can't
answer the question, answer with "See your video don't talk about this, I don't know".
Context: {context}
Question: {question}
"""
prompt = PromptTemplate.from_template(template)
prompt.format(context="Here is some context", question="Here is a question")
This code creates a prompt template and formats it with given context and question.
Let us extract the information from the video!
retrieved_context = retriever.invoke("laptop")
questions = [
"what did Mr. Bean ordered in his alone birthday party video and he don't like that food?"
]
for question in questions:
formatted_prompt = prompt.format(context=retrieved_context, question=question)
response_from_model = model.invoke(formatted_prompt)
parsed_response = parser.parse(response_from_model)
print(f"Question: {question}")
print(f"Answer: {parsed_response}")
print()
What dish it is ? Do you have any guesses run this code and let me know in the comment section!!
View the full code Here
We successfully built a Retrieval-Augmented Generation (RAG) system using the Youtube Videos.
Thank you for joining me on the second episode of AI Weekly with Krithi!
I hope you found it informative and engaging.
See you next week for more exciting AI topics and practical demos.
Have a great week ahead and stay tuned!
Cheers,
Kiruthika Subramani.