Build a talking ChatGPT Bot in any language with Google Text-to-speech (TTS) and OpenAI
In this follow-up article, we will improve the Chatbot from last episode and replace the audio module with Google TTS.
We also made the bot to understand multiple languages and respond correctly.
Here is the link to my last post if you would like to understand the context.
Create Google Service Account and download JSON key
1. Log on to Google Cloud portal, select Credentials - CREATE CREDENTIALS - Service account.
2. Provide service account details, you can call it anything. Select CREATE AND CONTINUE.
3. Assign the Cloud Speech Administrator role, select CONTINUE.
4. We will skip Step 3 and click DONE.
5. Back at CRENDENTIAL VIEW, select Manage service accounts.
6. Click the three dots and choose Manage keys.
7. Create a new key and select JSON as the format. It will be saved locally.
8. Replace the path to locate the JSON file.
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('/path/to/key.json')
9. Then, you can pass the credentials to the TextToSpeechClient constructor.
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient(credentials=credentials)
10. Btw, you need to install the google-cloud-texttospeech package in your Python environment. Make sure you have the latest version of the package installed by including the --upgrade flag.
pip install google-cloud-texttospeech
pip install --upgrade google-cloud-texttospeech
11. Before the code can successfully run, I need to enable Text-to-Speech API.
领英推荐
Replace the pyttsx3 code with Google TTS
1. Now, let's have a look at the existing code for pyttsx3 section where we convert the text to audio for Gradio to play back.
Just to refresh what we did in the last episode.
The ChatGPT API response is passed on to system_message, which gets converted to a mp3 file and returned to Gradio.
? ? engine = pyttsx3.init(
? ? engine.setProperty("rate", 150)
? ? engine.setProperty("voice", "english-us")
? ? engine.save_to_file(system_message, "response.mp3")
? ? engine.runAndWait()
? ? return "response.mp3")
2. Next we will remove pyttsx3 code and replace with the Google TTS code.
from google.oauth2 import service_account
? ? credentials = service_account.Credentials.from_service_account_file("/path/to/Google/JSON/Credential")
from google.cloud import texttospeech
# ? generate speech from system_message using Google Cloud Text-to-Speech API
? ? client = texttospeech.TextToSpeechClient(credentials=credentials)
? ? synthesis_input = texttospeech.SynthesisInput(text=system_message)
? ? voice = texttospeech.VoiceSelectionParams(
? ? ? ? language_code="en-GB", name="en-GB-Neural2-A"
? ? )
? ? audio_config = texttospeech.AudioConfig(
? ? ? ? audio_encoding=texttospeech.AudioEncoding.MP3
? ? )
? ? response = client.synthesize_speech(
? ? ? ? input=synthesis_input, voice=voice, audio_config=audio_config
? ? )
The above code passes the ChatGPT API response system_message to the Google TTS module, selected a voice and generated the speech as a mp3 file.
3. Then we use uuid library to generate the audio file called filename, which is returned to Gradio to play back.
? ? # save the audio content to a file
? ? with open("output.mp3", "wb") as out:
? ? out.write(response.audio_content)
? ? print('Audio content written to file "output.mp3"')
? ? return "output.mp3"
Auto detect and respond in different languages
How about we uplift the bot again so it can detect your language and respond correspondingly? That's pretty cool huh!
First, let's remind ourselves how the bot works.
I was not able to do this natively with OpenAI because I couldn't find a language attribute from neither the Whisper nor ChatGPT response.
So I used a Python library called langdetect.
1. Install and import langdetect.
pip install langdetect
import langdetect
2. I selected four languages here, French, Chinese, Japanese and English. You can find the voice profile from here.
# ? Define a dictionary to map the detected language to language code and voice nam
? ? language_dict = {
? ? ? ? "fr": ("fr-FR", "fr-FR-Wavenet-A"),
? ? ? ? "zh": ("cmn-CN", "cmn-CN-Wavenet-C"),
? ? ? ? "ja": ("ja-JP", "ja-JP-Neural2-D"),
? ? }
# ? Set the language and voice for Google TTS based on the detected language
? ? if detected_lang in language_dict:
? ? ? ? language_code, voice_name = language_dict[detected_lang]
? ? else:
? ? ? ? language_code = "en-US"
? ? ? ? voice_name = "en-US-Wavenet-D"e
3. So that's it, the above just goes on top of the Google TTS code block and you will have a ChatGPT Audio Bot that responds in as many languages as you like (as long as you define them).
Appendix: Complete working python code
Disclaimer: (Also written by ChatGPT)
I would like to clarify that I did not write the code presented here.
The credit goes to ChatGPT.
Additionally, I do not have any prior experience in Python programming. The purpose of sharing this code is to showcase the potential of Generative AI tools in enabling individuals without formal coding experience to develop useful applications.
pip?install?openai
pip?install?gradio
pip?install?pyttsx3
pip install langdetect
import langdetect
import gradio as gr
import openai
import pyttsx3
openai.api_key = ""
conversation = [
? ? ? ? {"role": "system", "content": "You are an intelligent professor."},
? ? ? ? ]
def transcribe(audio):
? ? print(audio)
# ? Whisper API
? ? audio_file = open(audio, "rb")
? ? transcript = openai.Audio.transcribe("whisper-1", audio_file)
# ? ChatGPT API
? ? conversation.append({"role": "user", "content": transcript["text"]})
? ? response = openai.ChatCompletion.create(
? ? model="gpt-3.5-turbo",
? ? messages=conversation
? ? )
? ? system_message = response["choices"][0]["message"]["content"]
? ? conversation.append({"role": "assistant", "content": system_message})
# ? Language detection
? ? import langdetect
? ? detected_lang = langdetect.detect(transcript["text"])
? ?
# ? Define a dictionary to map the detected language to language code and voice name
? ? language_dict = {
? ? ? ? "fr": ("fr-FR", "fr-FR-Wavenet-A"),
? ? ? ? "zh": ("cmn-CN", "cmn-CN-Wavenet-C"),
? ? ? ? "ja": ("ja-JP", "ja-JP-Neural2-D"),
? ? }
# ? Set the language and voice for Google TTS based on the detected language
? ? if detected_lang in language_dict:
? ? ? ? language_code, voice_name = language_dict[detected_lang]
? ? else:
? ? ? ? language_code = "en-US"
? ? ? ? voice_name = "en-US-Wavenet-D"
# ? generate speech from system_message using Google Cloud Text-to-Speech API
? ? from google.oauth2 import service_account
? ? credentials = service_account.Credentials.from_service_account_file("/path/to/your/JSON/Credential")
? ? from google.cloud import texttospeech
? ? client = texttospeech.TextToSpeechClient(credentials=credentials)
? ? synthesis_input = texttospeech.SynthesisInput(text=system_message)
? ? voice = texttospeech.VoiceSelectionParams(
? ? ? ? language_code=language_code, name=voice_name
? ? )
? ? audio_config = texttospeech.AudioConfig(
? ? ? ? audio_encoding=texttospeech.AudioEncoding.MP3
? ? )
? ? response = client.synthesize_speech(
? ? ? ? input=synthesis_input, voice=voice, audio_config=audio_config
? ? )
# return the audio file as Gradio output
? ? import uuid
# save the audio content to a file
? ? with open("output.mp3", "wb") as out:
? ? out.write(response.audio_content)
? ? print('Audio content written to file "output.mp3"')
# return the path to the saved file as Gradio output
? ? return "output.mp3"
# ? Gradio output
bot = gr.Interface(fn=transcribe, inputs=gr.Audio(source="microphone", type="filepath"), outputs="audio")
bot.launch()
TTPA President, USino Founder/CEO, LLM of Intellectual Property (USA), CPVA (Certified Patent Valuation Analyst), CSN (Certified Strategic Negotiator), CLP (Certified Licensing Professional), Arbitrator, Patent Examiner
1 年Hi, Leon. Thanks for this nice tutorial. How can I reach out to you via email? Mine is [email protected] Thanks.
Full-stack Developer at OLLI Technology. Leetcode contest rating 2781 (Global ranking 358/535,861)
1 年Thank you, but is there a way to perform text to speech from a streaming text. ChatGPT supports stream completions, so is it possible to speak as soon as receiving token from chatgpt?
Consulting Digital, QA automation, Online Analytics, Data Analytics, Online Security, Teaching Internet & E-Commerce, Robotics (NLP)
1 年There is a small issue at the end of the last code box in where quotes are missing: # return the path to the saved file as Gradio output ? ? return output.mp3 it should be: "output.mp3" Works like a charm! Thanks
Consulting Digital, QA automation, Online Analytics, Data Analytics, Online Security, Teaching Internet & E-Commerce, Robotics (NLP)
1 年Thanks Leo for putting things together and sharing your knowledge. I just gave it a try, did some updates there and there, and it is working for me :)