登录查看更多内容

Step-by-Step Guide to Build an AI News Reader

Isha Singh

Cloud Architecture Consultant @ E2E Networks | MBA, Marketing | B2B Expertise

发布日期: 2024年6月13日

+ 关注

In this blog, we will build a virtual AI news reader who will read out news with accurate lip syncing.?

Introduction - The Rise of AI News Anchors

Odisha TV, a private news channel based in Odisha, recently launched the first regional AI news anchor called ‘Lisa’. She is an AI-generated avatar clad in traditional Odia attire, and presents news in both Odia and English across the network's television and digital platforms. This development follows the debut of 'Fedha', an AI-generated news presenter introduced by Kuwait News, affiliated with the Kuwait Times.

The rise of AI news anchors like Lisa and Fedha highlights the growing importance and potential of artificial intelligence in the media industry. These AI presenters offer several advantages, such as the ability to deliver news consistently and efficiently without the need for breaks or time off. Moreover, they can be programmed to present news in multiple languages, making information more accessible to a wider audience.

Workflow

The workflow of building this application, in sequential order, is as follows:

We fetch the latest top news using thenewsapi.com.
We chunk the news articles into smaller documents and store them into a vector database.
User inputs query.
Information relevant to the query, along with the original query, is sent to the LLM (Llama 3 on Ollama) for generating a response.
Response is converted to audio using TTS.
This audio is then lip synced onto a standard video of a news reader using Wav2Lip.

The Code

Since we’ll be using many different types of AI technologies, we need a high-performance GPU for our task. E2E Networks provides a fleet of such GPUs geared for building our AI application. You can check out the offerings at https://myaccount.e2enetworks.com/.

Once you have spun a GPU node, the first step is to install the required libraries.


TTS
langchain
ollama
torch

First set up the text splitter, embeddings model, and prompt template for the RAG pipeline.


from langchain_text_splitters import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=512,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)


from langchain.embeddings import HuggingFaceEmbeddings


embeddings = HuggingFaceEmbeddings(
    model_name="BAAI/bge-large-en"
)


from langchain_core.prompts import PromptTemplate


# Define the ChatPromptTemplate for user interaction
template = """Answer the following question from the context
    context = {context}
    question = {question}
"""
prompt = PromptTemplate(input_variables=["context","question"], template= template)

Now, we’ll write a function that sends an API request to thenewsapi to receive the latest trending stories from India. Make sure to get your free API key by registering on this website.?


import requests
from datetime import datetime, timedelta


def get_top_news():
    # Get the current time and subtract one day
    one_day_ago = datetime.now() - timedelta(days=1)
    # Format the date and time
    published_after_time = one_day_ago.strftime("%Y-%m-%dT%H:%M:%S")


    # Define the endpoint URL
    url = "https://api.thenewsapi.com/v1/news/top"


    # Set the query parameters
    params = {
        'api_token': NEWS_API,
        'locale': 'in',
        'limit': 3,
	  'category': 'sports',	
        'published_after': published_after_time
    }


    # Send the GET request
    response = requests.get(url, params=params)


    # Check if the request was successful
    if response.status_code == 200:
        # Return the JSON response
        return response.json()
    else:
        # Return the error message
        return response.text

The response looks something like this:

{'meta': {'found': 1183, 'returned': 3, 'limit': 3, 'page': 1},

?'data': [{'uuid': '40fc3f6e-4671-435f-9f3c-933500c61bb7',

???'title': 'Al Nassr vs Al Ittihad Live Streaming: How To Watch Cristiano Ronaldo Play Live',

???'description': 'Al Nassr vs Al Ittihad Saudi Pro League 2023-24 will be played today, Monday, 27 May. Know how to watch the live streaming of the football match in India. Check...',

???'keywords': 'Al Nassr, Al Ittihad, Al Nassr vs Al Ittihad date, Al Nassr vs Al Ittihad time, Al Nassr vs Al Ittihad live streaming, Al Nassr vs Al Ittihad live telecast in India, Al Nassr vs Al Ittihad Saudi? Pro League 2023-24, Al Nassr vs Al Ittihad Saudi Pro League, Al Nassr vs Al Ittihad Saudi Pro League 2024, Saudi Pro League 2023-24',

???'snippet': 'Al Nassr is gearing up to face Al Ittihad in the final Saudi Pro League 2023-24 match on Monday, 27 May. The Al Nassr vs Al Ittihad match will be conducted at t...',

???'url': 'https://www.thequint.com/sports/football/al-nassr-vs-al-ittihad-live-streaming-how-to-watch-cristiano-ronaldo-play-live',

???'image_url': 'https://images.thequint.com/thequint%2F2024-05%2Fe4d46606-a954-43f5-bf42-c7619c56fc3c%2F7e480a46f76c54b8a07de537b1b1121a.jpg',

???'language': 'en',

???'published_at': '2024-05-27T11:06:24.000000Z',

???'source': 'thequint.com',

???'categories': ['general'],

???'relevance_score': None,

???'locale': 'in'},

??{'uuid': '8d73ddd9-f40a-409c-850a-86a53fd88cbd',

???'title': "Iran's acting President addresses new Parliament after helicopter crash killing President, others",

???'description': 'Iran’s acting President Mohammad Mokhber addressed the country’s new parliament in his first public speech since last week’s helicopter crash that killed ...',

???'keywords': 'Iran, Iran parliament, Iran Raisi, Iran President, Iran acting President, Mohammad Mokhber',

???'snippet': "Iran's acting President Mohammad Mokhber addressed the country's new parliament on May 27 in his first public speech since last week's helicopter crash that kil...",

???'url': 'https://www.thehindu.com/news/international/irans-acting-president-addresses-new-parliament-after-helicopter-crash-killing-president-others/article68221184.ece',

Fast Company 5 个月前

This AI newsletter is all you need #95

Towards AI 7 个月前

Artificial Intelligence #230

Andriy Burkov 5 个月前

???'image_url': 'https://th-i.thgim.com/public/incoming/12j8jk/article68221259.ece/alternates/LANDSCAPE_1200/APTOPIX_Iran_Politcis_37563.jpg',

???'language': 'en',

???'published_at': '2024-05-27T11:03:49.000000Z',

???'source': 'thehindu.com',

???'categories': ['general', 'politics'],

...

??'published_at': '2024-05-27T11:03:05.000000Z',

???'source': 'thehindu.com',

???'categories': ['general', 'politics'],

???'relevance_score': None,

???'locale': 'in'}]}

The above response contains URLs to the news articles. In order to get the complete news, we have to scrape the articles. We can do so using the function below:


import requests
from bs4 import BeautifulSoup
from langchain.docstore.document import Document
from langchain_community.vectorstores import FAISS


def scrape_news_data():


    news_data = get_top_news()
    scraped_data = []


    for article in news_data['data']:
        url = article['url']
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')


        # Extract the title of the article
        title = soup.find('title').text


        # Extract the text of the article
        article_text = ''
        for paragraph in soup.find_all('p'):
            article_text += paragraph.text + '\n'


        # Create a document to store the scraped data
        scraped_article = Document(page_content= article_text, metadata= {'title': title})


        scraped_data.append(scraped_article)




    vectorstore = FAISS.from_documents(text_splitter.split_documents(scraped_data),embeddings)


    global retriever
    retriever = vectorstore.as_retriever(search_kwargs= {'k':10})


    return 'News Collected Successfully'

Next, we write a helper function for a RAG application. This function generates the context from the vector store given a query.

After this, we write a function that uses Llama 3 from Ollama to generate a response to the user query.?


from langchain_community.llms import Ollama


def respond_to_query(query):
    context = get_context(query, retriever)
    llm = Ollama(model="llama3")


    return llm.invoke(prompt.format(question=query, context= context))

Make sure you’ve installed Ollama on your system, launched an Ollama server, and pulled Llama 3. You can follow the instructions here.?

Then, we’ll create a function that takes text as input and uses TTS to generate the corresponding audio clip.


import subprocess
import TTS


def run_tts_command(text):
    # Define the command as a list of arguments
    command = [
        'tts',  # Command executable
        '--text', text,  # Text for TTS
        '--model_name', 'tts_models/multilingual/multi-dataset/xtts_v2',  # Model name
        '--vocoder_name', 'vocoder_models/universal/libri-tts/wavegrad',  # Vocoder name
        '--out_path', '/home/vardh/ai-news-avatar/Wav2Lip/output.wav',  # Output path
        '--speaker_idx', 'Brenda Stern',  # Speaker index
        '--language_idx', 'en'  # Language index
    ]


    # Run the command
    try:
        result = subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        print("TTS generation successful, output saved to 'output.wav'")
        return '/home/vardh/ai-news-avatar/Wav2Lip/output.wav'  # Return the standard output if needed
    except subprocess.CalledProcessError as e:
        print(f"An error occurred: {e.stderr.decode()}")
        return None

Finally, we come to the lip syncing part. First, clone the Wav2Lip repository:

git clone https://github.com/Rudrabha/Wav2Lip.git?

Then download the model weights as shown in Readme, and place them in the checkpoints folder.?

Another weight (missing in Readme) can be downloaded from here: https://drive.google.com/drive/folders/1oZRSG0ZegbVkVwUd8wUIQx8W7yfZ_ki1. Name it as mobilenet.pth and place it in the checkpoints directory.

Then we’ll write a function that generates a lip-synced video from the previous audio clip. It returns the final path of the generated video. The parameter face represents the input video.?


import os
import subprocess


def run_wav2lip_command():
    # Navigate to the Wav2Lip directory
    wav2lip_dir = '/home/vardh/ai-news-avatar/Wav2Lip'
    os.chdir(wav2lip_dir)


    # Construct the command
    command = [
        'python', 'inference.py',
        '--checkpoint_path', 'checkpoints/wav2lip.pth',
        '--face', 'face.mp4',
        '--audio', 'output.wav'
    ]


    # Run the command
    result = subprocess.run(command)
    return '/home/vardh/ai-news-avatar/Wav2Lip/results/result_voice.mp4'

Gradio code for the UI:


import gradio as gr


with gr.Blocks() as demo:
    with gr.Row():
        btn = gr.Button("Fetch Latest News")
        response = gr.Text()
    with gr.Row():
        query = gr.Textbox(label= "Ask me about the news")
        news_text = gr.Textbox(label= "Response")
    with gr.Row():
        news_audio = gr.Audio(label= 'Audio Response', type= 'filepath')
        news_video = gr.Video(label= 'Lip Synced Video')




    btn.click(fn= scrape_news_data, inputs= None, outputs= response)
    news_query = query.submit(fn= respond_to_query, inputs= query, outputs= news_text)
    audio_query = news_query.then(fn= run_tts_command, inputs= news_text , outputs= news_audio)
    audio_query.then(fn= run_wav2lip_command, inputs= None, outputs= news_video)


demo.launch(server_name='0.0.0.0')

Results

Here’s a short video demonstrating the quality of the lip-syncing.?

Final Words

By leveraging advanced technologies like Llama 3 for text generation, TTS for voice synthesis, and Wav2Lip for lip syncing, one can easily generate a news reader avatar for custom use cases.

Hrijul Dey

AI Engineer| LLM Specialist| Python Developer|Tech Blogger

1 个月

Level up your PC! Harness the power of AI with this simple setup. Run Ollama locally for fast, secure AI interactions. Let's dive in https://www.artificialintelligenceupdate.com/ollama-how-to-set-up-a-local-ai-server-in-you-pc/riju/ #learnmore #AI&U . #OllamaAI #LocalServer #TechUpgrade

1 次回应

Muskan Kaka

Team lead

5 个月

Hello Everyone!！! #freshers can you Apply Position - Sales Executive/tele caller Lanaguage - *Odia* + Hindi / English Location -Permanent Work from home opportunity Salary slip: 25 k to 43 k per month+incentive+benefits Day shift- any time 6 Day working Qualification-Graduation/freshers/students/ *Apply now* - https://docs.google.com/forms/d/1ObmGI7Xq0pS77XTewXN-vA-jYD77FWhwrBg9cTHKSuA/edit Skills required -Excellent communication skills, learning skills, positive attitude, strong demonstrated work ethics, Passionate and responsible If you are interested for this role please apply on the link Thanks Muskan nxtwave Team

Beyond Chats

5 个月

This article is a groundbreaking approach to AI news reading. ??

1 次回应

Stanley Russel

5 个月

Creating an AI news reader involves integrating various technologies to provide a seamless experience for users. The process includes fetching the latest news, generating appropriate responses to user queries, converting these responses to audio, and producing a lip-synced video of a virtual news anchor. Utilizing E2E Cloud’s high-end GPUs ensures optimal performance for these tasks, leveraging the power of cloud computing for efficient processing and delivery. How do you envision the future of AI in transforming traditional news delivery systems, and what other applications can benefit from such technology?

查看更多评论

要查看或添加评论，请登录

查看全部

Step-by-Step Guide to Build an AI News Reader

Isha Singh

Cloud Architecture Consultant @ E2E Networks | MBA, Marketing | B2B Expertise

Introduction - The Rise of AI News Anchors

Workflow

The Code

领英推荐

Results

Final Words

更多精彩文章

社区洞察

其他会员也浏览了

Artificial Intelligence #212

LLMs and RAG are great. What’s Next?

Artificial Intelligence #209

Artificial Intelligence #209

Artificial Intelligence #181

Watch#2: Small Models Matter and the Fight Against Hallucinations

LLM/RAG: Knowledge Graphs, Multi-Agents, Ultrafast Fine-tuning, No Latency

Artificial Intelligence #175

Voxel51's Filtered Views Newsletter — April 26, 2024

AI/ML news summary: week 32

Introduction - The Rise of AI News Anchors

Workflow

The Code

领英推荐

Results

Final Words

Chat with Your City: Steps to Build an AI Chatbot Using Llama 3 and DSPy

2024年6月6日

Why You Should Build Your AI Application on E2E Cloud

2024年6月4日

How AI-Powered Medical Imaging Is Transforming Healthcare

2024年4月30日

Step-by-Step Guide to Fine-Tuning SDXL for the Advertising, Media and Entertainment Sector

2024年4月18日

How E2E Networks Is Simplifying Cloud Computing for Startups and Enterprises

2024年4月11日

Efficiently Training Transformers: A Comprehensive Guide to High-Performance NLP Models

2024年4月9日

An Executive's Guide to AI Adoption

2024年4月4日

A Deep-Dive into H100 Cloud GPUs for CXOs and Leaders

2024年4月3日

NVIDIA L4 vs. A100 GPUs: Choosing the Right Option for Your AI Needs

2024年3月28日

An Executive's Guide to AI Adoption

2024年3月27日

社区洞察

其他会员也浏览了

Artificial Intelligence #212

LLMs and RAG are great. What’s Next?

Artificial Intelligence #209

Artificial Intelligence #209

Artificial Intelligence #181

Watch#2: Small Models Matter and the Fight Against Hallucinations

LLM/RAG: Knowledge Graphs, Multi-Agents, Ultrafast Fine-tuning, No Latency

Artificial Intelligence #175

Voxel51's Filtered Views Newsletter — April 26, 2024

AI/ML news summary: week 32