登录查看更多内容

Build a talking ChatGPT Bot in any language with Google Text-to-speech (TTS) and OpenAI

Leo Wang

AI and Automation, Business Intelligence, Enterprise Mobility and always in Web3.

发布日期: 2023年3月16日

+ 关注

In this follow-up article, we will improve the Chatbot from last episode and replace the audio module with Google TTS.

We also made the bot to understand multiple languages and respond correctly.

Here is the link to my last post if you would like to understand the context.

Create Google Service Account and download JSON key

1. Log on to Google Cloud portal, select Credentials - CREATE CREDENTIALS - Service account.

2. Provide service account details, you can call it anything. Select CREATE AND CONTINUE.

3. Assign the Cloud Speech Administrator role, select CONTINUE.

4. We will skip Step 3 and click DONE.

5. Back at CRENDENTIAL VIEW, select Manage service accounts.

6. Click the three dots and choose Manage keys.

7. Create a new key and select JSON as the format. It will be saved locally.

8. Replace the path to locate the JSON file.

from google.oauth2 import service_account


credentials = service_account.Credentials.from_service_account_file('/path/to/key.json')

9. Then, you can pass the credentials to the TextToSpeechClient constructor.

from google.cloud import texttospeech


client = texttospeech.TextToSpeechClient(credentials=credentials)

10. Btw, you need to install the google-cloud-texttospeech package in your Python environment. Make sure you have the latest version of the package installed by including the --upgrade flag.

pip install google-cloud-texttospeech
pip install --upgrade google-cloud-texttospeech

11. Before the code can successfully run, I need to enable Text-to-Speech API.

Tommaso Babucci 1 年前

Unveiling Limitless Potential: 5 Applications of…

Hemdeep R. 1 年前

OpenAI unveils ChatGPT API and here's how easy it is…

Stephen Nelson 1 年前

Replace the pyttsx3 code with Google TTS

1. Now, let's have a look at the existing code for pyttsx3 section where we convert the text to audio for Gradio to play back.

Just to refresh what we did in the last episode.

The ChatGPT API response is passed on to system_message, which gets converted to a mp3 file and returned to Gradio.

? ? engine = pyttsx3.init(
? ? engine.setProperty("rate", 150)
? ? engine.setProperty("voice", "english-us")
? ? engine.save_to_file(system_message, "response.mp3")
? ? engine.runAndWait()


? ? return "response.mp3")

2. Next we will remove pyttsx3 code and replace with the Google TTS code.

    from google.oauth2 import service_account
? ? credentials = service_account.Credentials.from_service_account_file("/path/to/Google/JSON/Credential")
    from google.cloud import texttospeech

# ? generate speech from system_message using Google Cloud Text-to-Speech API
? ? client = texttospeech.TextToSpeechClient(credentials=credentials)
? ? synthesis_input = texttospeech.SynthesisInput(text=system_message)
? ? voice = texttospeech.VoiceSelectionParams(
? ? ? ? language_code="en-GB", name="en-GB-Neural2-A"
? ? )
? ? audio_config = texttospeech.AudioConfig(
? ? ? ? audio_encoding=texttospeech.AudioEncoding.MP3
? ? )
? ? response = client.synthesize_speech(
? ? ? ? input=synthesis_input, voice=voice, audio_config=audio_config
? ? )

The above code passes the ChatGPT API response system_message to the Google TTS module, selected a voice and generated the speech as a mp3 file.

3. Then we use uuid library to generate the audio file called filename, which is returned to Gradio to play back.

? ? # save the audio content to a file
? ? with open("output.mp3", "wb") as out:
? ? out.write(response.audio_content)
? ? print('Audio content written to file "output.mp3"')
? ? return "output.mp3"

Auto detect and respond in different languages

How about we uplift the bot again so it can detect your language and respond correspondingly? That's pretty cool huh!

First, let's remind ourselves how the bot works.

1. Gradio records audio in any language.
2. Pass the audio to Whisper API to transcribe.
3. Send the transcription to ChatGPT API.
4. Detect the language from ChatGPT API response.
5. Set the language for Google TTS
6. Google TTS creates the audio in the correct language
7. Gradio plays back the Audio

I was not able to do this natively with OpenAI because I couldn't find a language attribute from neither the Whisper nor ChatGPT response.

So I used a Python library called langdetect.

1. Install and import langdetect.

pip install langdetect
import langdetect

2. I selected four languages here, French, Chinese, Japanese and English. You can find the voice profile from here.

# ? Define a dictionary to map the detected language to language code and voice nam
? ? language_dict = {
? ? ? ? "fr": ("fr-FR", "fr-FR-Wavenet-A"),
? ? ? ? "zh": ("cmn-CN", "cmn-CN-Wavenet-C"),
? ? ? ? "ja": ("ja-JP", "ja-JP-Neural2-D"),
? ? }


# ? Set the language and voice for Google TTS based on the detected language
? ? if detected_lang in language_dict:
? ? ? ? language_code, voice_name = language_dict[detected_lang]
? ? else:
? ? ? ? language_code = "en-US"
? ? ? ? voice_name = "en-US-Wavenet-D"e

3. So that's it, the above just goes on top of the Google TTS code block and you will have a ChatGPT Audio Bot that responds in as many languages as you like (as long as you define them).

Appendix: Complete working python code

Disclaimer: (Also written by ChatGPT)

I would like to clarify that I did not write the code presented here.

The credit goes to ChatGPT.

Additionally, I do not have any prior experience in Python programming. The purpose of sharing this code is to showcase the potential of Generative AI tools in enabling individuals without formal coding experience to develop useful applications.

pip?install?openai
pip?install?gradio
pip?install?pyttsx3
pip install langdetect

import langdetect
import gradio as gr
import openai
import pyttsx3
openai.api_key = ""

conversation = [
? ? ? ? {"role": "system", "content": "You are an intelligent professor."},
? ? ? ? ]

def transcribe(audio):
? ? print(audio)

# ? Whisper API
? ? audio_file = open(audio, "rb")
? ? transcript = openai.Audio.transcribe("whisper-1", audio_file)

# ? ChatGPT API
? ? conversation.append({"role": "user", "content": transcript["text"]})

? ? response = openai.ChatCompletion.create(
? ? model="gpt-3.5-turbo",
? ? messages=conversation
? ? )

? ? system_message = response["choices"][0]["message"]["content"]
? ? conversation.append({"role": "assistant", "content": system_message})

# ? Language detection
? ? import langdetect
? ? detected_lang = langdetect.detect(transcript["text"])
? ? 
# ? Define a dictionary to map the detected language to language code and voice name
? ? language_dict = {
? ? ? ? "fr": ("fr-FR", "fr-FR-Wavenet-A"),
? ? ? ? "zh": ("cmn-CN", "cmn-CN-Wavenet-C"),
? ? ? ? "ja": ("ja-JP", "ja-JP-Neural2-D"),
? ? }


# ? Set the language and voice for Google TTS based on the detected language
? ? if detected_lang in language_dict:
? ? ? ? language_code, voice_name = language_dict[detected_lang]
? ? else:
? ? ? ? language_code = "en-US"
? ? ? ? voice_name = "en-US-Wavenet-D"

# ? generate speech from system_message using Google Cloud Text-to-Speech API
? ? from google.oauth2 import service_account
? ? credentials = service_account.Credentials.from_service_account_file("/path/to/your/JSON/Credential")

? ? from google.cloud import texttospeech
? ? client = texttospeech.TextToSpeechClient(credentials=credentials)
? ? synthesis_input = texttospeech.SynthesisInput(text=system_message)
? ? voice = texttospeech.VoiceSelectionParams(
? ? ? ? language_code=language_code, name=voice_name
? ? )
? ? audio_config = texttospeech.AudioConfig(
? ? ? ? audio_encoding=texttospeech.AudioEncoding.MP3
? ? )
? ? response = client.synthesize_speech(
? ? ? ? input=synthesis_input, voice=voice, audio_config=audio_config
? ? )


#   return the audio file as Gradio output
? ? import uuid

#   save the audio content to a file
? ? with open("output.mp3", "wb") as out:
? ? out.write(response.audio_content)
? ? print('Audio content written to file "output.mp3"')

# return the path to the saved file as Gradio output
? ? return "output.mp3"

# ? Gradio output
bot = gr.Interface(fn=transcribe, inputs=gr.Audio(source="microphone", type="filepath"), outputs="audio")
bot.launch()

BinQiang Liu

TTPA President, USino Founder/CEO, LLM of Intellectual Property (USA), CPVA (Certified Patent Valuation Analyst), CSN (Certified Strategic Negotiator), CLP (Certified Licensing Professional), Arbitrator, Patent Examiner

1 年

Hi, Leon. Thanks for this nice tutorial. How can I reach out to you via email? Mine is [email protected] Thanks.

Thao Nguyen

Full-stack Developer at OLLI Technology. Leetcode contest rating 2781 (Global ranking 358/535,861)

1 年

Thank you, but is there a way to perform text to speech from a streaming text. ChatGPT supports stream completions, so is it possible to speak as soon as receiving token from chatgpt?

Prof. Dr. Theo Almeida Murphy

Consulting Digital, QA automation, Online Analytics, Data Analytics, Online Security, Teaching Internet & E-Commerce, Robotics (NLP)

1 年

There is a small issue at the end of the last code box in where quotes are missing: # return the path to the saved file as Gradio output ? ? return output.mp3 it should be: "output.mp3" Works like a charm! Thanks

1 次回应

Prof. Dr. Theo Almeida Murphy

Consulting Digital, QA automation, Online Analytics, Data Analytics, Online Security, Teaching Internet & E-Commerce, Robotics (NLP)

1 年

Thanks Leo for putting things together and sharing your knowledge. I just gave it a try, did some updates there and there, and it is working for me :)

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Build a talking ChatGPT Bot in any language with Google Text-to-speech (TTS) and OpenAI

Leo Wang

AI and Automation, Business Intelligence, Enterprise Mobility and always in Web3.

Create Google Service Account and download JSON key

领英推荐

Replace the pyttsx3 code with Google TTS

Auto detect and respond in different languages

更多精彩文章

社区洞察

其他会员也浏览了

Demystifying ChatGPT

2023 Developer’s Edge: Mastering ChatGPT for Breakthrough Efficiency and Creativity with Jai infoway

How to Write Effective ChatGPT Prompts for the Best AI Answers

ChatGPT, Claude, Copilot, Gemini, or Perplexity: Which LLM is Best for You?

Unleash the Power of Prompting: Elevate Your ChatGPT Interactions with Expert Techniques!

How To Build Your Own ChatGPT API With Express_js

Exploring ChatGPT

Using ChatGPT in a Digital Transformation

What Is ChatGPT? Is It The Future?

The BUZZ: ChatGPT

Create Google Service Account and download JSON key

领英推荐

Replace the pyttsx3 code with Google TTS

Auto detect and respond in different languages

Create your first Power Pages with Copilot

2023年12月22日

Create your first Power Apps with Copilot

2023年11月29日

Build your 1st app using Autogen with VSCode and Docker

2023年11月6日

Use Azure OpenAI with Power Automate to build Power Virtual Agent in Power Apps

2023年8月17日

ChatGPT your own data with Langchain and Streamlit - Part 2 now with User File Upload!

2023年8月3日

ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 3 - Connect to SharePoint Document Library

2023年7月23日

Your Personal Voice GPT Assistant with Eleven Labs

2023年7月13日

Email Classification with AI Builder

2023年6月29日

ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 2

2023年6月15日

Search SharePoint and OneDrive with Power Virtual Agent

2023年5月31日

社区洞察

其他会员也浏览了

Demystifying ChatGPT

2023 Developer’s Edge: Mastering ChatGPT for Breakthrough Efficiency and Creativity with Jai infoway

How to Write Effective ChatGPT Prompts for the Best AI Answers

ChatGPT, Claude, Copilot, Gemini, or Perplexity: Which LLM is Best for You?

Unleash the Power of Prompting: Elevate Your ChatGPT Interactions with Expert Techniques!

How To Build Your Own ChatGPT API With Express_js

Exploring ChatGPT

Using ChatGPT in a Digital Transformation

What Is ChatGPT? Is It The Future?

The BUZZ: ChatGPT