登录查看更多内容

Azure OpenAI Realtime API - VoiceRAG

Pablo Piovano

????Director AI @OZ |?? Microsoft MVP | AI Cloud Advocate ?? | ??Gen AI Specialist | ?? Cloud Engineer | ????Power Platform Enthusiast | ????.NET & Tech Lover | ?? Copilot

发布日期: 2025年1月22日

The Azure OpenAI GPT-4o Real-Time API is setting a new standard for interacting with AI through voice. Leveraging optimized "speech in, speech out" models, it provides the ability to create low-latency conversational experiences without the need to chain multiple models for each step of speech recognition, natural language processing, and voice synthesis.

This proposal is ideal for developing virtual assistants, real-time translators, and other use cases requiring immediate and natural responses. As of today, January 22, the model is available in the East US 2 (eastus2) and Sweden Central (swedencentral) regions, and is offered in two versions:

gpt-4o-realtime-preview (2024-12-17)
gpt-4o-realtime-preview (2024-10-01)

It is essential to create or reuse a resource in one of these regions before implementing the gpt-4o-realtime-preview model.

To explore this technology, you can test it on Azure AI Foundry, specifically in the real-time audio playground.

Additionally, you can find detailed information about the API and its architecture by exploring the Azure OpenAI Real-Time GPT-4o Audio repository on GitHub: Azure OpenAI Real-Time Audio SDK

What Makes It Different?

Traditionally, developing a voice assistant required chaining multiple systems:

Automatic Speech Recognition (ASR): To transcribe audio into text.
Language Model: To process the text and generate responses.
Text-to-Speech (TTS): To convert responses into audio.

Each of these steps could introduce significant delays and risk losing important nuances of the conversation, such as intonation or expressiveness.

With the GPT-4o-Realtime approach, all these functions are integrated into a single service that simultaneously handles both voice input and spoken response generation. This not only drastically reduces response times but also enhances the naturalness and fluidity of the interaction.

领英推荐

GEMMA, Google's New LLM Model Powered by Gemini…

Cogent Integrated Business Solutions Inc. 1 年前

The Future of Enterprise Solutions with OpenAI:…

Amplework Software Pvt. Ltd. 2 年前

OpenAI's GPT-4o Unveiled: Here's What You Need to Know

Walter Shields 10 个月前

Key Advantages

More Natural and Human-Like Conversations The integration of voice-to-voice capabilities and latency reduction ensures fluid dialogues that closely mimic the experience of speaking with another person, significantly enhancing interaction quality.
Multimodal Interaction The service supports both text and audio, offering users the flexibility to communicate in the way that best suits their needs.
High-Quality Predefined Voices The API includes a collection of consistent, high-quality voices, eliminating the need to train custom voices from scratch and speeding up deployment times.
Built-In Security and Privacy Protections Microsoft incorporates advanced automatic monitoring and compliance with privacy policies, critical for scenarios involving the handling of sensitive or confidential information.
Dynamic Function Execution Enables the ability to perform additional actions or queries during the conversation, such as accessing external data or managing real-time reservations, all without interrupting the natural flow of the dialogue.

This functionality paves the way for the next topic: VoiceRAG, a powerful combination of RAG and voice capabilities that further expands the possibilities for interaction.

VoiceRAG is an advanced example that combines Retrieval-Augmented Generation (RAG) with the Azure OpenAI GPT-4o API for audio, creating more robust and functional applications. This approach leverages:

Azure AI Search to retrieve relevant information (such as documentation, articles, or business data).
GPT-4o-Realtime-Preview to generate and narrate personalized responses in real time.

The result is an enhanced experience for virtual assistants that not only deliver natural conversations but also have the ability to query and return accurate information on the spot. This capability allows them to adapt to more complex and demanding use cases.

I share a video where we can see the capabilities of VoiceRAG using the Microsoft 2024 Annual Report as data in Azure AI Search.

Resources you can't afford to miss or overlook.

Here is a post from the Azure AI Services Blog, where Pablo Castro explains the architecture and workflow of this technology in detail: VoiceRAG: An App Pattern for RAG + Voice Using Azure AI Search and the GPT-4o Realtime API for Audio
Additionally, you can explore this practical example in the following GitHub repository: Azure-Samples/aisearch-openai-rag-audio
GPT-4o Realtime API for speech and audio (Preview)
Realtime API (Preview) reference
Introducing the GPT-4o-Audio-Preview: A New Era of Audio-Enhanced AI Interaction

I hope this explanation has been very helpful. Feel free to leave your comments and questions.

?? Until next time, community!

Hassan Bin Zaheer

1 个月

Useful tips...

1 次回应

Robert Kutz

Director, Analytics & AI at OZ

2 个月

Nice job Pablo, very cool!

1 次回应

Karen Hausheer

2 个月

Thanks Pablo - love the language switch - it handles it seamlessly.

1 次回应

Aijaz Ahmed

SQL Database Administrator | SQL Developer | Data Engineer | BI Specialist

2 个月

Nice!

1 次回应

Bruno Capuano

Principal Cloud Advocate | Empowering Teams to Build AI Solutions with Azure | Innovation Leader | Simplifying Complex Problems | Speaker & Lifelong Learner

2 个月

Awesome! ?????? Bonus: A full .NET implementation in Blazor: https://aka.ms/netaieshopliterealtimechat

查看更多评论

要查看或添加评论，请登录

Pablo Piovano的更多文章

My 2024 in AI: Great Challenges and Great Achievements

2024年12月23日

My 2024 in AI: Great Challenges and Great Achievements

This year has been intense yet deeply rewarding. I started 2024 with a health setback, but soon after, everything took…
Guía Paso a Paso: Autenticación en Azure OpenAI con Microsoft Entra ID y Consumo desde Postman

2024年12月10日

Guía Paso a Paso: Autenticación en Azure OpenAI con Microsoft Entra ID y Consumo desde Postman

Esta publicación forma parte del Calendario de Adviento AI 2024, iniciativa liderada por Pablito Piovano y Roberto…

4 条评论
Microsoft Introduces Autonomous Agents in Copilot Studio at the AI Tour

2024年10月25日

Microsoft Introduces Autonomous Agents in Copilot Studio at the AI Tour

Microsoft AI Tour At the recent AI Tour, Microsoft introduced a groundbreaking advancement in artificial intelligence:…

4 条评论
Artificial Intelligence in Action: From Adoption to Regulation

2024年4月22日

Artificial Intelligence in Action: From Adoption to Regulation

Over the last decade, artificial intelligence (AI) has evolved from a mere futuristic concept to become an essential…

7 条评论
My Journey with Artificial Intelligence

2023年12月26日

My Journey with Artificial Intelligence

Hello! I'm Pablito Piova, and I'm here to share with you my experience in the exciting world of artificial intelligence…
Introduction to Azure AI Search - Revolutionizing Business Search

2023年12月12日

Introduction to Azure AI Search - Revolutionizing Business Search

This publication is part of the AI Advent Calendar 2023, an initiative led by Héctor Pérez, Alex Rostan, Pablo Piovano,…
Orca 2 - Small Language Models

2023年12月7日

Orca 2 - Small Language Models

This publication is part of the AI Advent Calendar 2023, an initiative led by Héctor Pérez, Alex Rostan, Pablo Piovano,…
Microsoft Designer: Revolutionizing Graphic Design with Artificial Intelligence

2023年12月4日

Microsoft Designer: Revolutionizing Graphic Design with Artificial Intelligence

In today's digital era, where every pixel counts, Microsoft Designer emerges: an AI-powered graphic design application…
Fine-Tuning or Not, That Is the Question

2023年11月9日

Fine-Tuning or Not, That Is the Question

Hello Community, In the dynamic realm of artificial intelligence, fine-tuning stands out as a key technique, especially…
Face Api — Detectando Masks

2021年8月18日

Face Api — Detectando Masks

En un post anterior vimos el servicio cognitivo de Vision — Face API, creamos el recurso en el portal de Microsoft…

See all articles

Azure OpenAI Realtime API - VoiceRAG

Pablo Piovano

????Director AI @OZ |?? Microsoft MVP | AI Cloud Advocate ?? | ??Gen AI Specialist | ?? Cloud Engineer | ????Power Platform Enthusiast | ????.NET & Tech Lover | ?? Copilot

领英推荐

Resources you can't afford to miss or overlook.

Pablo Piovano的更多文章

社区洞察

其他会员也浏览了

Azure Generative AI Services: I Come Around

OpenAI Drops GPT-4o Mini

Introduction into Azure's AI ecosystem

OpenAI and Its Groundbreaking Product Ecosystem

OpenAI Unveils GPT-4o, Google Ups AI Agents and Search & Meta Bets on Camera-Powered Future

GPT-4 Accepts Image Inputs, Here’s What That Means for IDP

DeepSeek vs. OpenAI vs. Qwen: Comparing Leading AI Platforms in 2025

OpenAI Unveils GPT-4o: All About Its Amazing Features

Consuming Azure AI is easy.

Enterprise Search with ChatGPT & Speech Synthesis with Azure Text to Speech Avatar

领英推荐

Resources you can't afford to miss or overlook.

Pablo Piovano的更多文章

My 2024 in AI: Great Challenges and Great Achievements

Guía Paso a Paso: Autenticación en Azure OpenAI con Microsoft Entra ID y Consumo desde Postman

Microsoft Introduces Autonomous Agents in Copilot Studio at the AI Tour

Artificial Intelligence in Action: From Adoption to Regulation

My Journey with Artificial Intelligence

Introduction to Azure AI Search - Revolutionizing Business Search

Orca 2 - Small Language Models

Microsoft Designer: Revolutionizing Graphic Design with Artificial Intelligence

Fine-Tuning or Not, That Is the Question

Face Api — Detectando Masks

社区洞察

其他会员也浏览了

Azure Generative AI Services: I Come Around

OpenAI Drops GPT-4o Mini

Introduction into Azure's AI ecosystem

OpenAI and Its Groundbreaking Product Ecosystem

OpenAI Unveils GPT-4o, Google Ups AI Agents and Search & Meta Bets on Camera-Powered Future

GPT-4 Accepts Image Inputs, Here’s What That Means for IDP

DeepSeek vs. OpenAI vs. Qwen: Comparing Leading AI Platforms in 2025

OpenAI Unveils GPT-4o: All About Its Amazing Features

Consuming Azure AI is easy.

Enterprise Search with ChatGPT & Speech Synthesis with Azure Text to Speech Avatar