GenAI Chatbot Augmentation with LLM and Video Insights

GenAI Chatbot Augmentation with LLM and Video Insights


-Sreedhar Sambamoorthi, Prahlad Agnihotri, Nilanjana Dev Nath, AB in Bev


NO! EVEN THIS IS NOT GENERATED BY CHATGPT!

ChatGPT, OpenAI, Langchain, and LLM are the buzzwords in the ML and AI space currently. The extent to which it has spread to all corners of the of business is unprecendented with many use-cases generating a lot of buzz like chatbots, document summariser, insight generation etc. A lot of these require document files like PDF, DOC, Excel, PPT or your plain and simple .TXT files.

This article is meant to give you insight into how you can build a GenAI use-case using Video Input Data.

Large Language Models (LLMs) are essentially a part of AI which has been trained on huge data sets (in case of ChatGPT – it’s the internet in entirety) that is used to produce a human-like a response/dialogues to the given queries.?

“Though LLMs are very powerful they do not possess any subjective experience, emotions or consciousness – they work purely based on the data they are trained on”?

LLMs can be decoded as = Data on which it works + Transformers + Continuous training these Transformer Models?

“What are Transformers?”?

Transformers and attention mechanisms – Transformers models have a way of learning the context and meaning of a sentence by tracking its relationships (called attention mechanisms) which uses ways for identifying even distant data points has certain kind of dependency within the elements.??

Azure Video Indexer

The main player of this particular game is Azure Video Indexer which gives us insights into many things regarding the video along with the timestamp of the same

  1. People
  2. Keywords
  3. OCR
  4. Transcript
  5. Emotion
  6. Speakers and Named Entities etc

This particular service can be accessed by the using the from the video_indexer package in Python

pip install video_indexer

Post this, the step by step process is as follows

  1. Placing the relevant videos in our input folder
  2. Running the indexer and load them onto your account
  3. Read the indexed JSON and generate relevant data

In order for this to work, there are few pre-requisites for this -

  1. Go to the Video Indexer website and register for a trial version to get started. A trial version gives you the ability to index and process 4 hours of videos post which you must purchase
  2. Post registering for the trial version, extract the following details from the website to proceed coding in Python — Subscription Key and Account ID which you can find in the Settings section

Architecture

·?????? Embeddings – Before everything, the queries that the user feds into the system is given a shape which the model can understand. Word embeddings gives us the power to do so. It helps us in shaping the meaning our query in a numerical form which is being fed to the model?

In our case we have used HuggingFace embeddings which consists of large range assortment of pre-trained models and different embeddings which are derived from LLM models like Bert, GPT etc. These are useful in obtaining the semantic information.??

·?????? Vector DBs – Once we have the embeddings ready – we need a way to retrieve the information asked based on the query. The documents (or information) which have the highest similarity index(or some matrix) are the sent back as output .??

But the question here is - what is a Similarity Search ??

Similarity searches are underlying methods/Algorithms which with the help in finding homogenous items (in our case words) fast searching through using the semantic representations of our data.?

There are different similarity search algorithms like TF-IDF, BM25, SENTENCE-BERT.??

For our usecase, we have used – FIASS.??

FAISS aka Facebook AI Similarity Search facilitates the way for quick and optimized search using efficient algorithms and then cluster the embedding vector specially used for multimedia documents. Its search algorithm is defined below –??

The search operation here is the argmin. For a given dimension d, and set of vectors xi? it builds a data structure in RAM. The ||.|| is the Euclidean distance. Here, the index is the data structure.?

?

·?????? Fine tuning and Prompt Engineering –? Fine tuning can be performed for a specific task keeping in mind that the dataset is smaller and relevant aiding in adaptability for LLMs. Prompt engineering is the skill in designing optimized input prompts so that the outputs are more relevant and structured.??

Coding In Python

Now that we have the information handy, let’s start the exercise

  1. First, place any sample videos in your input folder. I had put a couple of motivational videos — one of Amitabh Bachchan and one of Virat Kohli (sidenote, do checkout this channel for daily motivation :D)

2.???? ?The first part of the code will be initializing the relevant parameters

import json from video_indexer import VideoIndexer CONFIG = { ????'SUBSCRIPTION_KEY': '<your key>', ????'LOCATION': 'trial', ????'ACCOUNT_ID': '<your account ID>' } vi = VideoIndexer( ????vi_subscription_key=CONFIG['SUBSCRIPTION_KEY'], ????vi_location=CONFIG['LOCATION'], ????vi_account_id=CONFIG['ACCOUNT_ID'] )

3.???? For successful looping across the files in the folder, you will have to add a Sleep argument onto it. This is because, once the first file goes through the upload_to_video_indexer class, the loop goes immediately moves to the next iteration in the for loop. The next iteration however gets interrupted because the first iteration is still getting indexed in Cloud. Therefore, uploading of the one video and Indexing of the previous video CANNOT happen at the same time. I have therefore kept a 2 minute sleep timer in which I believe a 4–5 minute video can get indexed. You can keep it even higher if you feel the video is too long

import os import time import datetime video_list=[] for file in os.listdir(_path): ????video_id = vi.upload_to_video_indexer(input_filename=_path+file,video_name=file+'treated',video_language='English') ????video_list.append(video_id) ????time.sleep(120)

The video_list had the list of the video_id of the indexed videos. This is an important step as the video_id’s are going to be used for information extraction. Write the video_list onto a CSV if needed to keep it handy later

4.???? Next step is extracting the information from the video IDs in the list.

info_list=[] for i in video_list: ????info = vi.get_video_info(i,video_language='English') ??? info_list.append(info)

Info_list contains the JSON extract of all the parameters that are there in the indexed videos. We are currently goint to be focusing just on extracting the transcript from the JSON files

df_transcript_final=pd.DataFrame() for i in info_list: ????r = json.dumps(i) ????loaded_r = json.loads(r) ????r_refined=loaded_r['videos'][0]['insights']['transcript'] ????df_cleaned=pd.DataFrame.from_dict(r_refined) ????df_cleaned_final= pd.concat([df_cleaned.drop(['instances'], axis=1), df_cleaned['instances'].apply(pd.Series)], axis=1) ????df_cleaned_final.columns=['id','text','confidence','speakerId','language','json_extract'] ????df_cleaned_final= pd.concat([df_cleaned_final.drop(['json_extract'], axis=1), df_cleaned_final['json_extract'].apply(pd.Series)], axis=1) ????df_cleaned_final['file_name']=loaded_r['name'] ??? df_transcript_final=df_transcript_final.append(df_cleaned_final) ??? del df_cleaned, df_cleaned_final ?

The final dataframe df_transcript_final will look something like this

The JSON will be present in info_list. Do give it a read to understand the other parameters in them

This can be literally be parsed as a data input to your LLM framework. You would want to use the CSVLoader argument from Langchain to use it

from langchain.document_loaders.csv_loader import CSVLoader loader = CSVLoader(file_path='./df_transcript_final.csv') data = loader.load()?

Now the game gets interesting. Below are the snippets of the code on how to build the Chatbot

  1. Splitting the data and indexing into FAISS DB
  2. Building conversational chain
  3. Leveraging other Langchain capabilities

Sample description of the flow of document to Chat Response (from the link)

a) Import the necessary libraries

import os import openai import chromadb from langchain.llms import AzureOpenAI from langchain.embeddings.openai import OpenAIEmbeddings from urllib.parse import quote import sys from langchain.document_loaders import CSVLoader from langchain.embeddings import OpenAIEmbeddings, HuggingFaceEmbeddings from langchain.vectorstores import Chroma from langchain.llms import OpenAI, AzureOpenAI from langchain.chains import ( ????ConversationalRetrievalChain, ????RetrievalQAWithSourcesChain, ????LLMChain, ) from langchain.memory import ConversationBufferMemory from langchain.text_splitter import CharacterTextSplitter from langchain.prompts import PromptTemplate from langchain.llms import AzureOpenAI?

b) Writing the transcript into CSV and then processing

openai.api_key="<your key>" from langchain.document_loaders import CSVLoader df_transcript_final.to_csv('final_text.csv') # load the document as before loader = CSVLoader('./final_text.csv') documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200) documents = text_splitter.split_documents(documents)

c) We use FAISS db to store the embeddings from where the information shall be extracted and then referred to using HuggingFaceEmbeddings

from langchain.vectorstores import FAISS vectordb = FAISS.from_documents( ??????documents, ??????embedding=HuggingFaceEmbeddings() ??? )

d) Finally, we initialise the LLM and Conversational Chain

from langchain.chains import RetrievalQA from langchain.llms import OpenAI qa_chain = RetrievalQA.from_chain_type( ????llm=AzureOpenAI(deployment_name = "<deployment>", ??????????????????????????model_name = "<model>", ??????????????????????????temperature = 0, ??????????????????????????verbose=True), ????retriever=vectordb.as_retriever(search_kwargs={'k': 50}), ??? return_source_documents=True )

# we can now execute queries against our Q&A chain

prompt='Revert with the summary of the question asked along with the name of the file in question and the timestamp' result = qa_chain({'query': 'Summarise the entire transcript ?'+prompt}) print(result['result'])

Conclusion

There are several ways in which Generative AI can be leveraged to bring about transformation. The key is to be on the lookout within your team and understand HOW!

Aparajita Sinha

Consultant @ PwC | FMS Delhi '22 | IIT Guwahati

10 个月

Nice article and a good read! Nilanjana Dev Nath

Ananya Patra

Full-Stack Marketer | Scaling Startups 0-10 | GTM Strategist | Demand Generation | ABM | Product

10 个月

Great work! Nilanjana Dev Nath

要查看或添加评论,请登录

社区洞察

其他会员也浏览了