登录查看更多内容

Your Personal Voice GPT Assistant with Eleven Labs

Leo Wang

AI and Automation, Business Intelligence, Enterprise Mobility and always in Web3.

发布日期: 2023年7月13日

Not long ago we have built an audio ChatGPT bot using OpenAI Whisper API and Google's Text-to-Speech module. The quality of the synthetic voice wasn't so great. Today I am responding to a viewer's request and building it again with Eleven Labs API.

Eleven Labs API

Eleven Labs so far has produced the best sounding voice on the market. You can sign up for free to use up to 10,000 characters.

Once you signed in, click the profile picture and select "Profile".

You will find the API key there.

Select your voice

Select the menu and go to Voice Library.

2. You can sample the voices you like and add to VoiceLab.

3. Rename the voice. Go back to the VoiceLab and you can see them there.

Finding the Voice ID

To use the voice in our code, you need to find the Voice ID.

I used Postman with GET command.

In the response, you can identify the Voice ID by the name you imported as.

????????????"voice_id":?"bqGlZCw25vVCZyrtYzMnSx",
????????????"name":?"My?Therapist"

With API key and Voice ID, you are now ready to use it in the code.

Replace Google TTS Code

In the previous blog, we already built an audio ChatGPT with Google TTS module. Now we are just going to replace that part.

领英推荐

ChatGPT+ Will Soon Be Able To See, Hear, Speak, And…

Artificial Inspiration 1 年前

Integration of ChatGPT with Bot: A Guide

Analytics Insight? 8 个月前

OpenAI Unveils GPT-4o (Omni), Plans to Integrate…

Cogent Integrated Business Solutions Inc. 9 个月前

Eleven Labs have two TTS endpoints. We are going to run a POST to the stream endpoint.

https://api.elevenlabs.io/v1/text-to-speech/<voice-id>
https://api.elevenlabs.io/v1/text-to-speech/<voice-id>/stream

The standard endpoint will convert the text into speech as a mp3 file, while the stream endpoint will return an audio stream.

For a better chatbot experience, we do not want to wait for the mp3 file before the response can be played back. We are going to use the steaming method.

From the previous code, you can remove the Google TTS part.

Below is the code you will replace with. Here is the reference.

? ? CHUNK_SIZE = 1024
? ? url = "https://api.elevenlabs.io/v1/text-to-speech/bqGlZCwvVCZyrtYzMnSx/stream"

? ? headers = {
? ? ? "Accept": "audio/mpeg",
? ? ? "Content-Type": "application/json",
? ? ? "xi-api-key": config.ELEVEN_API_KEY
? ? }

? ? data = {
? ? ? "text": system_message,
? ? ? "model_id": "eleven_monolingual_v1",
? ? ? "voice_settings": {
? ? ? ? "stability": 0.5,
? ? ? ? "similarity_boost": 0.5
? ? ? }
? ? }

? ? response = requests.post(url, json=data, headers=headers, stream=True)

You can add the API key directly with double quotes. Adjusting the stability and similarity boost will alter the voice, but I haven't tried it.

Streaming the Audio

Since we want to stream the AI voice, we can use the ffplay on Windows instead of relying on Audio output from Gradio.

import subprocess
? ? cmd = ['ffplay', '-autoexit', '-']
? ? proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
? ? for chunk in response.iter_content(chunk_size=1024):
? ? ? ? proc.stdin.write(chunk)

? ? proc.stdin.close()
? ? proc.wait()

Change the display output

Lastly, we wanted to change the output from audio to displaying our chat transcript.

Since the Conversation global variable already has all the transcript, we just need to make it more readable with the following code.

# Format the conversation for display
? ? formatted_conversation = ""
? ? for message in conversation:
? ? ? ? if message["role"] == "user":
? ? ? ? ? ? formatted_conversation += "Me: " + message["content"] + "\n"
? ? ? ? elif message["role"] == "assistant":
? ? ? ? ? ? formatted_conversation += "You: " + message["content"] + "\n"
? ? return formatted_conversation.strip()

Now we are returning the text output, so we could change the Gradio command from audio to text in the Output.

bot = gr.Interface(fn=transcribe, inputs=gr.Audio(source="microphone", type="filepath"), outputs="text"

)

This is what the final demo looks like and the audio will be played out automatically without you clicking anything. It's not perfect but the output will display all the transcripts.

Voice Design

In the Eleven Labs documentation, it talked pausing by adding "-" into the text. So I crafted the system role prompt and trying to get the model respond with "-" from time to time. It did make the GPT model to respond with "-" but when Eleven Labs plays back, it is not as good as I expected. If you have worked out a better way to make the voice more natural sounding, please let me know.

要查看或添加评论，请登录

Leo Wang的更多文章

Create your first Power Pages with Copilot

2023年12月22日

Create your first Power Pages with Copilot

We are all-in on the Microsoft Copilot ecosystem. In the previous article, we discussed how to create a Visitor…
Create your first Power Apps with Copilot

2023年11月29日

Create your first Power Apps with Copilot

Microsoft continues rolling out Copilot and now they are everywhere, literately. Today, we are going to take a deep…

4 条评论
Build your 1st app using Autogen with VSCode and Docker

2023年11月6日

Build your 1st app using Autogen with VSCode and Docker

Step by step guide for beginners, no coding experience required! Microsoft released Autogen in September 2023. Autogen…

2 条评论
Use Azure OpenAI with Power Automate to build Power Virtual Agent in Power Apps

2023年8月17日

Use Azure OpenAI with Power Automate to build Power Virtual Agent in Power Apps

Introduction We have built Power Virtual Agent in the past calling the public OpenAI API. If you work in a large…

10 条评论
ChatGPT your own data with Langchain and Streamlit - Part 2 now with User File Upload!

2023年8月3日

ChatGPT your own data with Langchain and Streamlit - Part 2 now with User File Upload!

Watch the YT Video for a follow along What is Streamlit Streamlit library is similar to Gradio, which both are quite…

3 条评论
ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 3 - Connect to SharePoint Document Library

2023年7月23日

ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 3 - Connect to SharePoint Document Library

Warning - Long Thread Ahead..

50 条评论
Email Classification with AI Builder

2023年6月29日

Email Classification with AI Builder

Microsoft continuously enhances the capabilities of AI Builder, thereby improving the functionality of Power Automate…
ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 2

2023年6月15日

ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 2

Introduction Not long ago, Microsoft Mechanics YouTube channel showcased a ChatGPT-like application that can be built…

6 条评论
Search SharePoint and OneDrive with Power Virtual Agent

2023年5月31日

Search SharePoint and OneDrive with Power Virtual Agent

Today we are going to talk about the three new AI capabilities that Power Virtual Agent has: Boosted Conversation…

1 条评论
Enterprise Data Search with ChatGPT and Azure OpenAI - Part 1

2023年5月4日

Enterprise Data Search with ChatGPT and Azure OpenAI - Part 1

Background Microsoft has published an Azure demo app on GitHub that brings ChatGPT reasoning with Enterprise Search…

5 条评论

See all articles

Your Personal Voice GPT Assistant with Eleven Labs

Leo Wang

AI and Automation, Business Intelligence, Enterprise Mobility and always in Web3.

Eleven Labs API

Select your voice

Finding the Voice ID

Replace Google TTS Code

领英推荐

Streaming the Audio

Change the display output

Voice Design

Leo Wang的更多文章

社区洞察

其他会员也浏览了

?? Sorry for the delayed reply

The World This Week in AI (16th December 2024)

The Ultimate Guide to Seamless Integration of ChatGPT AI in OTT Platforms

Claude AI vs ChatGPT: A Comparative Analysis

Unveiling ChatGPT-4o, Llama3, DBRx, and More GenAI Breakthroughs ??

Leveraging ChatGPT for Policy Creation

ChatGPT Gets A GPT-4o Upgrade

ChatGPT vs Google BARD: Who does it best?

Analysis of ChatGPT and its impact on the chip industry

Video Call with ChatGPT

Eleven Labs API

Select your voice

Finding the Voice ID

Replace Google TTS Code

领英推荐

Streaming the Audio

Change the display output

Voice Design

Leo Wang的更多文章

Create your first Power Pages with Copilot

Create your first Power Apps with Copilot

Build your 1st app using Autogen with VSCode and Docker

Use Azure OpenAI with Power Automate to build Power Virtual Agent in Power Apps

ChatGPT your own data with Langchain and Streamlit - Part 2 now with User File Upload!

ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 3 - Connect to SharePoint Document Library

Email Classification with AI Builder

ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 2

Search SharePoint and OneDrive with Power Virtual Agent

Enterprise Data Search with ChatGPT and Azure OpenAI - Part 1

社区洞察

其他会员也浏览了

?? Sorry for the delayed reply

The World This Week in AI (16th December 2024)

The Ultimate Guide to Seamless Integration of ChatGPT AI in OTT Platforms

Claude AI vs ChatGPT: A Comparative Analysis

Unveiling ChatGPT-4o, Llama3, DBRx, and More GenAI Breakthroughs ??

Leveraging ChatGPT for Policy Creation

ChatGPT Gets A GPT-4o Upgrade

ChatGPT vs Google BARD: Who does it best?

Analysis of ChatGPT and its impact on the chip industry

Video Call with ChatGPT