Enterprise Search with ChatGPT & Speech Synthesis with Azure Text to Speech Avatar

Enterprise Search with ChatGPT & Speech Synthesis with Azure Text to Speech Avatar

Artificial Intelligence takes centre stage of any technical conversations in the recent times. More so, after the global excitement generated with the advent of Generative Artificial Intelligence (GenAI) applications powered by Large Language Models (LLMs). GenAI use cases span across domains, such as financial services, information technology, education, entertainment, engineering, and design.

Throughout the course of my professional work, I get opportunity to interact with customers and hear their views on GenAI. Every conversation brings out few very interesting use cases that gets you thinking of ways you can build a solution around.

Before we end the year 2023, I thought to work on a solution that can make conversation AI more engaging. I remembered, the public preview announcement of Azure Text-To-Speech (TTS) Avatar and wanted to try this out. So, off I went.

What is TTS Avatar on Azure AI Speech?

Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a custom text to speech avatar) speaking with a natural-sounding voice. Please refer the link below for more information:

Text to speech avatar overview - Speech service - Azure AI services | Microsoft Learn

The easiest way to get started is to use the Speech Studio. Launch the Speech Studio from the deployed Speech service. Choose the Text to speech Avatar (Preview) feature from the 'Text to Speech' capabilities

The Text to Speech (TTS) Avatar playground offers a range of capabilities. We can choose from a range of avatars, different languages and the out-of-the-box voice samples that we want to try. Once we type the text, click on the 'Preview Video' to hear the avatar utter the words in the voice and style selected. Whilst, this is a good, quick way to check the feature out, real-life use cases requires us to use the in-built REST APIs and SDKs that we can use in your applications. The studio provides python samples that exactly does that for you and this is what I've used.

So, what is my use case then?

Throughout the year 2023, one of the most prevalent use cases or requirements I've worked with my customers on, is Enterprise Knowledge Search powered by ChatGPT (Azure OpenAI gpt 3.5 turbo model) using Retrieval Augmented Generation (RAG) pattern.

Here, I have simulated an HR application using which employees at Contoso can search and retrieve information on HR policies using ChatGPT.

Document ingestion & indexing on Azure AI Search

Fig: 1 Document ingestion and indexing flow

Before we can enable knowledge search, we need to ingest, index and prepare the data ready for search. This is what is depicted in Fig 1 above. The various steps have been described below:

  1. Ingestion of data into Azure storage account using Azure Data Factory (ADF). I have used blob storage to consolidate all heterogenous documents (PPT, PDF, etc.) in one place
  2. In this step, I have broken down the documents into smaller chunks using custom Web API skills on Azure AI Search and stored in a separate Azure storage container. Please refer the link to know more about this property Custom Web API skill in skillsets - Azure AI Search | Microsoft Learn
  3. In this step, we index individual chunks (documents) along with their vector embeddings in an Azure AI Search index

For steps 2 and 3, you can refer the code available in the sample notebook from GitHub: azure-search-vector-samples/demo-python/code/azure-search-custom-vectorization-sample.ipynb at main · Azure/azure-search-vector-samples (github.com)

Enterprise search with ChatGPT + TTS Avatar on Streamlit App

Fig 2: Enterprise search using ChatGPT & TTS Avatar using Streamlit application

Fig:2 represents the front end application that I've built using Streamlit. The application has the embedded logic that powers the enterprise search with ChatGPT and TTS avatar. Following is the sequence of activities that gets initiated when a user submits his/her search query:

  1. The search query is converted into a vector query and directed to the chunk index. The search returns top N relevant documents. I have used Top 5 in my case
  2. The search result is then augmented with the search query and passed as a prompt to GPT 3.5 Turbo model. The result is passed is used in two ways
  3. The text output is first shown in the front end of the Streamlit application. The other section of the front end performs the speech synthesis and generates the video output showing the TTS Avatar reading out the same text. This makes the whole search experience more engaging from accessibility standpoint and multi-modal

You can adjust the voice quality and other characteristics using Speech Synthesis Markup Language (SSML). Refer the link for more information about SSML - Speech Synthesis Markup Language (SSML) overview - Speech service - Azure AI services | Microsoft Learn

A quick look at the demo app that I built using Streamlit. You can go innovative and craft it the way you want.

Fig 3: Custom application using Streamlit


Subhasish G.

Senior Technical Program Manager - Azure OpenAI Service | Customer eXperience Engineering (CxE) ?? ?? @ Microsoft | 39x Azure Certified | GenAI Speaker

1 年

Love the use-case and explanation, Sankha Chakraborty

Mrunali B

Business Development Manger

1 年

A Strategic Guide to Product Modernizing with GenAI Get Your Copy: https://bit.ly/3NhxAjp, #genai #generativeai #generative #artificialintelligence #ai #aitechnology #generativeaitools #generativeartificialintelligence #generativemodels #technologysolutions #productdesign #productdevelopment #productinnovation

Ajay Kumar Barun

Senior Technical Specialist – Data & AI at Microsoft | Expert in Cloud-Native Architecture, Presales, Hybrid Solutions, Generative AI, Data & Database Technologies

1 年

Great , thanks for sharing

要查看或添加评论,请登录

Sankha Chakraborty的更多文章

社区洞察

其他会员也浏览了