Build a talking ChatGPT Bot in any language with Google Text-to-speech (TTS) and OpenAI

Build a talking ChatGPT Bot in any language with Google Text-to-speech (TTS) and OpenAI

In this follow-up article, we will improve the Chatbot from last episode and replace the audio module with Google TTS.

We also made the bot to understand multiple languages and respond correctly.

Here is the link to my last post if you would like to understand the context.


Create Google Service Account and download JSON key

1. Log on to Google Cloud portal, select Credentials - CREATE CREDENTIALS - Service account.

No alt text provided for this image

2. Provide service account details, you can call it anything. Select CREATE AND CONTINUE.

No alt text provided for this image

3. Assign the Cloud Speech Administrator role, select CONTINUE.

No alt text provided for this image

4. We will skip Step 3 and click DONE.

No alt text provided for this image

5. Back at CRENDENTIAL VIEW, select Manage service accounts.

No alt text provided for this image

6. Click the three dots and choose Manage keys.

No alt text provided for this image

7. Create a new key and select JSON as the format. It will be saved locally.

No alt text provided for this image

8. Replace the path to locate the JSON file.

from google.oauth2 import service_account


credentials = service_account.Credentials.from_service_account_file('/path/to/key.json')        

9. Then, you can pass the credentials to the TextToSpeechClient constructor.

from google.cloud import texttospeech


client = texttospeech.TextToSpeechClient(credentials=credentials)        

10. Btw, you need to install the google-cloud-texttospeech package in your Python environment. Make sure you have the latest version of the package installed by including the --upgrade flag.

pip install google-cloud-texttospeech
pip install --upgrade google-cloud-texttospeech        

11. Before the code can successfully run, I need to enable Text-to-Speech API.

No alt text provided for this image

Replace the pyttsx3 code with Google TTS

1. Now, let's have a look at the existing code for pyttsx3 section where we convert the text to audio for Gradio to play back.

Just to refresh what we did in the last episode.

The ChatGPT API response is passed on to system_message, which gets converted to a mp3 file and returned to Gradio.

? ? engine = pyttsx3.init(
? ? engine.setProperty("rate", 150)
? ? engine.setProperty("voice", "english-us")
? ? engine.save_to_file(system_message, "response.mp3")
? ? engine.runAndWait()


? ? return "response.mp3")        

2. Next we will remove pyttsx3 code and replace with the Google TTS code.

    from google.oauth2 import service_account
? ? credentials = service_account.Credentials.from_service_account_file("/path/to/Google/JSON/Credential")
    from google.cloud import texttospeech

# ? generate speech from system_message using Google Cloud Text-to-Speech API
? ? client = texttospeech.TextToSpeechClient(credentials=credentials)
? ? synthesis_input = texttospeech.SynthesisInput(text=system_message)
? ? voice = texttospeech.VoiceSelectionParams(
? ? ? ? language_code="en-GB", name="en-GB-Neural2-A"
? ? )
? ? audio_config = texttospeech.AudioConfig(
? ? ? ? audio_encoding=texttospeech.AudioEncoding.MP3
? ? )
? ? response = client.synthesize_speech(
? ? ? ? input=synthesis_input, voice=voice, audio_config=audio_config
? ? )        

The above code passes the ChatGPT API response system_message to the Google TTS module, selected a voice and generated the speech as a mp3 file.

3. Then we use uuid library to generate the audio file called filename, which is returned to Gradio to play back.

? ? # save the audio content to a file
? ? with open("output.mp3", "wb") as out:
? ? out.write(response.audio_content)
? ? print('Audio content written to file "output.mp3"')
? ? return "output.mp3"        

Auto detect and respond in different languages

How about we uplift the bot again so it can detect your language and respond correspondingly? That's pretty cool huh!

First, let's remind ourselves how the bot works.

  • 1. Gradio records audio in any language.
  • 2. Pass the audio to Whisper API to transcribe.
  • 3. Send the transcription to ChatGPT API.
  • 4. Detect the language from ChatGPT API response.
  • 5. Set the language for Google TTS
  • 6. Google TTS creates the audio in the correct language
  • 7. Gradio plays back the Audio

I was not able to do this natively with OpenAI because I couldn't find a language attribute from neither the Whisper nor ChatGPT response.

So I used a Python library called langdetect.

1. Install and import langdetect.

pip install langdetect
import langdetect        

2. I selected four languages here, French, Chinese, Japanese and English. You can find the voice profile from here.

# ? Define a dictionary to map the detected language to language code and voice nam
? ? language_dict = {
? ? ? ? "fr": ("fr-FR", "fr-FR-Wavenet-A"),
? ? ? ? "zh": ("cmn-CN", "cmn-CN-Wavenet-C"),
? ? ? ? "ja": ("ja-JP", "ja-JP-Neural2-D"),
? ? }


# ? Set the language and voice for Google TTS based on the detected language
? ? if detected_lang in language_dict:
? ? ? ? language_code, voice_name = language_dict[detected_lang]
? ? else:
? ? ? ? language_code = "en-US"
? ? ? ? voice_name = "en-US-Wavenet-D"e        

3. So that's it, the above just goes on top of the Google TTS code block and you will have a ChatGPT Audio Bot that responds in as many languages as you like (as long as you define them).


Appendix: Complete working python code

Disclaimer: (Also written by ChatGPT)

I would like to clarify that I did not write the code presented here.

The credit goes to ChatGPT.

Additionally, I do not have any prior experience in Python programming. The purpose of sharing this code is to showcase the potential of Generative AI tools in enabling individuals without formal coding experience to develop useful applications.

pip?install?openai
pip?install?gradio
pip?install?pyttsx3
pip install langdetect

import langdetect
import gradio as gr
import openai
import pyttsx3
openai.api_key = ""

conversation = [
? ? ? ? {"role": "system", "content": "You are an intelligent professor."},
? ? ? ? ]

def transcribe(audio):
? ? print(audio)

# ? Whisper API
? ? audio_file = open(audio, "rb")
? ? transcript = openai.Audio.transcribe("whisper-1", audio_file)

# ? ChatGPT API
? ? conversation.append({"role": "user", "content": transcript["text"]})

? ? response = openai.ChatCompletion.create(
? ? model="gpt-3.5-turbo",
? ? messages=conversation
? ? )

? ? system_message = response["choices"][0]["message"]["content"]
? ? conversation.append({"role": "assistant", "content": system_message})

# ? Language detection
? ? import langdetect
? ? detected_lang = langdetect.detect(transcript["text"])
? ? 
# ? Define a dictionary to map the detected language to language code and voice name
? ? language_dict = {
? ? ? ? "fr": ("fr-FR", "fr-FR-Wavenet-A"),
? ? ? ? "zh": ("cmn-CN", "cmn-CN-Wavenet-C"),
? ? ? ? "ja": ("ja-JP", "ja-JP-Neural2-D"),
? ? }


# ? Set the language and voice for Google TTS based on the detected language
? ? if detected_lang in language_dict:
? ? ? ? language_code, voice_name = language_dict[detected_lang]
? ? else:
? ? ? ? language_code = "en-US"
? ? ? ? voice_name = "en-US-Wavenet-D"

# ? generate speech from system_message using Google Cloud Text-to-Speech API
? ? from google.oauth2 import service_account
? ? credentials = service_account.Credentials.from_service_account_file("/path/to/your/JSON/Credential")

? ? from google.cloud import texttospeech
? ? client = texttospeech.TextToSpeechClient(credentials=credentials)
? ? synthesis_input = texttospeech.SynthesisInput(text=system_message)
? ? voice = texttospeech.VoiceSelectionParams(
? ? ? ? language_code=language_code, name=voice_name
? ? )
? ? audio_config = texttospeech.AudioConfig(
? ? ? ? audio_encoding=texttospeech.AudioEncoding.MP3
? ? )
? ? response = client.synthesize_speech(
? ? ? ? input=synthesis_input, voice=voice, audio_config=audio_config
? ? )


#   return the audio file as Gradio output
? ? import uuid

#   save the audio content to a file
? ? with open("output.mp3", "wb") as out:
? ? out.write(response.audio_content)
? ? print('Audio content written to file "output.mp3"')

# return the path to the saved file as Gradio output
? ? return "output.mp3"

# ? Gradio output
bot = gr.Interface(fn=transcribe, inputs=gr.Audio(source="microphone", type="filepath"), outputs="audio")
bot.launch()        
BinQiang Liu

TTPA President, USino Founder/CEO, LLM of Intellectual Property (USA), CPVA (Certified Patent Valuation Analyst), CSN (Certified Strategic Negotiator), CLP (Certified Licensing Professional), Arbitrator, Patent Examiner

1 年

Hi, Leon. Thanks for this nice tutorial. How can I reach out to you via email? Mine is [email protected] Thanks.

回复
Thao Nguyen

Full-stack Developer at OLLI Technology. Leetcode contest rating 2781 (Global ranking 358/535,861)

1 年

Thank you, but is there a way to perform text to speech from a streaming text. ChatGPT supports stream completions, so is it possible to speak as soon as receiving token from chatgpt?

回复
Prof. Dr. Theo Almeida Murphy

Consulting Digital, QA automation, Online Analytics, Data Analytics, Online Security, Teaching Internet & E-Commerce, Robotics (NLP)

1 年

There is a small issue at the end of the last code box in where quotes are missing: # return the path to the saved file as Gradio output ? ? return output.mp3 it should be: "output.mp3" Works like a charm! Thanks

Prof. Dr. Theo Almeida Murphy

Consulting Digital, QA automation, Online Analytics, Data Analytics, Online Security, Teaching Internet & E-Commerce, Robotics (NLP)

1 年

Thanks Leo for putting things together and sharing your knowledge. I just gave it a try, did some updates there and there, and it is working for me :)

要查看或添加评论,请登录

社区洞察

其他会员也浏览了