RAG System with Video

RAG System with Video

Hello Everyone,It’s Friday, and guess who’s back? Hope you all had a fantastic week! This week, let’s dive into building a RAG system with a YouTube video.

No more procrastination this time, I promise! I’ll bring it to your devices on time.

If the article is delivered on time, don’t forget to leave a tip and a 5-star rating. I mean, likes and comments!Back in India, I used to hear this whenever I ordered food.


Now, I’m hungry just thinking about it. Let me finish this and grab my breakfast!We all love YouTube, right? Swiping through videos, each one not more than a minute long. And the most common phrase we hear is, “Like, share, subscribe!” I heard someone got a BMW with their YouTube income. Impressive! I spend 15 minutes every day contributing to someone’s income. Proud of myself!


I watch countless cooking videos, but I still end up making curd rice every day.I love that. Even in the freezing winter, haha.


I’ve noticed something, we’re not watching YouTube videos, just the shorts.The moral of the story is, when you feel sleepy, sleep. Don’t strain yourself swiping for leisure entertainment. Not more than 15 minutes a day!


We have tons of resources on YouTube. How about building a RAG that takes the video, extracts the transcript, and answers my questions?

Let’s say some YouTubers say, “Watch until the end to know the truth.” Just take the video, don’t waste your time. Give it to this RAG, and it will give you the answer instantly.Come on, let’s start building it!

No youtubers income is harmed here, disclaimer


Officially, welcome to the second episode of AI Weekly with Krithi!

Example of RAG with a YouTube Video

We download the transcription of a YouTube video and use an LLM for extracting information from that video. This is what we are going to do

Install the Dependencies

!pip3 install langchain
!pip3 install langchain_pinecone
!pip3 install langchain[docarray]
!pip3 install docarray
!pip3 install pypdf
!pip3 install youtube_transcript_api
        

Why we need to install these?

  • langchain is the main builder of your rag system.
  • langchain_pinecone helps you connect your system to a database if you're using one.
  • docarray and langchain[docarray] help you manage and organize the information from your videos.
  • pypdf helps you handle any PDF files related to your videos.
  • youtube_transcript_api gets the text from your YouTube videos.

Download an Example Transcript from a YouTube Video

You can change the ID of the video to download other video transcriptions. We save the content to a file.

from youtube_transcript_api import YouTubeTranscriptApi

srt = YouTubeTranscriptApi.get_transcript("SWm86rBsECw")  # CHANGE THE ID OF THE VIDEO 

with open("./files/youtube_transcription.txt", "a") as file:
    for i in srt:
        file.write(i['text'])        

This code gets the transcript from a YouTube video with the ID "SWm86rBsECw" and saves it as a text file named "youtube_transcription.txt".

What’s the video about?


Let’s ask this to our RAG, but if you are very interested, here’s a hint - It’s a birthday party of Mr. Bean, celebrated alone. It evokes different emotions based on our mindset.

Don’t you believe me? If you watch it from an audience perspective, you will laugh. If you imagine yourself as Mr. Bean in the situation, it hurts.

Beloved Birthday Wishes from Kiruthika to whoever is reading this article.


I heard your mind’s voice. Today is not my birthday. I forgot to add these terms - belated or advance, whichever applies to you.


But the wishes from my heart are heartfelt.

And the next part is

Select the LLM model to use

The model must be downloaded locally to be used, so if you want to run llama3, you should run:

ollama pull llama3        

Check the list of models available for Ollama here: https://ollama.com/library

You need to choose which LLM you want to use. Ollama offers a variety of models, including Llama 3, Phi 3, Mistral, Gemma 2, and more. Once you’ve selected the model, you need to download it locally to use it.


We instantiate the model and the embeddings

It means we create and initialize the model and its numerical representations of data (embeddings) for use in computations.

#MODEL = "gpt-3.5-turbo"
#MODEL = "mixtral:8x7b"
#MODEL = "gemma:7b"
#MODEL = "llama2"
MODEL = "llama3" # https://ollama.com/library/llama3
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)        

This code sets up and initializes the Llama 3 model (or whichever the model you choose from the comment line) and its embeddings for use. It imports the necessary classes from then langchain_community library and creates instances of the model and embedding.

Now, let us load the transcription previously saved using TextLoader

from langchain_community.document_loaders import TextLoader

loader = TextLoader("./files/youtube_transcription.txt")
text_documents = loader.load()
text_documents        

Let us make the the document into chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_documents = text_splitter.split_documents(text_documents)[:5]
text_documents        

This code imports the RecursiveCharacterTextSplitter class from then langchain.text_splitter module and uses it to split a text into smaller chunks. It then takes the first five chunks from the split text documents.

Why first 5 chunks?


The choice to take the first five chunks from the split text documents is likely for demonstration or testing purposes. By selecting a manageable number of chunks, the code can easily showcase how the text is split without overwhelming the user with too much information. This approach helps in verifying that the text splitting process works as intended and allows for quick inspection of the results.

Store the PDF in a vector space.

DocArrayInMemorySearch is a tool that stores documents in your computer’s memory, making it easy to quickly search through small sets of documents without needing a full database. It’s great for simple, small-scale projects where you want fast and straightforward document searching.

from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(text_documents, embedding=embeddings)
retriever = vectorstore.as_retriever()        

We instantiate the parser

from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()        

But why?

StrOutputParser is likely used to process and parse output data into a string format. This can be useful when you need to convert various types of output into a consistent string format for further processing or display.

Generate the conversation template

from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't
answer the question, answer with "See your video don't talk about this, I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
prompt.format(context="Here is some context", question="Here is a question")        

This code creates a prompt template and formats it with given context and question.

Let us extract the information from the video!

retrieved_context = retriever.invoke("laptop")
questions = [
   "what did Mr. Bean  ordered in his alone birthday party video and he don't like that food?"
]

for question in questions:
    formatted_prompt = prompt.format(context=retrieved_context, question=question)
    response_from_model = model.invoke(formatted_prompt)
    parsed_response = parser.parse(response_from_model)

    print(f"Question: {question}")
    print(f"Answer: {parsed_response}")
    print()        

What dish it is ? Do you have any guesses run this code and let me know in the comment section!!

View the full code Here

We successfully built a Retrieval-Augmented Generation (RAG) system using the Youtube Videos.

Thank you for joining me on the second episode of AI Weekly with Krithi!

I hope you found it informative and engaging.

See you next week for more exciting AI topics and practical demos.

Have a great week ahead and stay tuned!


Cheers,

Kiruthika Subramani.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了