Building a Real-Time Speech Translator Using Amazon's AI Services
Throughout this article, we will explore the development of a real-time speech translation application built using Amazon's AI services—Transcribe, Translate, and Polly. This application captures speech from a microphone, translates it into another language, and outputs the translation both as real-time audio and as text displayed on the screen.
Here is the high-level architectural diagram of the application.
You can clone the application's repository using the link below.
How The Application Works
The process is simple yet sophisticated, with three Amazon Web Services tools working in perfect harmony:
All of this happens in real time, allowing you to hear your spoken words translated into the target language instantly, using either a male or female voice.
Configuring The App and Setting Up The AWS Environment
To get started with the app, you'll need to set up your AWS environment by creating a Cognito Identity Pool and configuring its IAM role. This will grant the app access to AWS services like Transcribe, Translate, and Polly.
Please note that we will create an Identity Pool with unauthenticated access enabled, which is recommended only for prototyping purposes. For production-ready applications, it is advised to use a secure authentication flow, either by implementing a Cognito User Pool or by configuring the Identity Pool to issue credentials to authenticated users from trusted identity providers.
Open the Amazon Cognito console, and create a new identity pool choosing the option Guest access.
Go to the next steps to configure permissions and properties. In our example we have used TranslateMyVoiceIdentityPoolRole and TranslateMyVoiceIdentityPool for the IAM Role and name of the identity pool. After creating the Identity Pool, open it and copy the Identity Pool Id (this will be used later to configure the React Client App).
Now navigate to IAM, click on Roles and search for TranslateMyVoiceIdentityPoolRole. You will need to add policies that allow your app to interact with the AWS services.
Select now the role, and then click on the Create inline policy menu.
We need to add the following policies.
For simplicity I'm attaching below the three policies in JSON format, you need to add to the role
领英推荐
TranslateMyVoiceTranscribePolicy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "transcribe:StartStreamTranscriptionWebSocket",
"Resource": "*"
}
]
}
TranslateMyVoiceTranslatePolicy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "translate:TranslateText",
"Resource": "*"
}
]
}
TranslateMyVoicePollyPolicy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "polly:SynthesizeSpeech",
"Resource": "*"
}
]
}
This is how should look like the TranslateMyVoiceIdentityPoolRole after creating the three policies:
We are not ready to configure and start using the app!
Clone the repository https://github.com/alchimya/translate-my-voice.git
Install all the dependencies with the command npm install
Now open the .env file and configure the two environment variables
Now launching the command npm start you should be able to run the app.
How To Use The App
Using the app is straightforward. Simply select the language you'll be speaking, the language you want your speech translated into, and choose whether you'd like to hear the translation in a male or female voice, and click on the Start Speech button.
The Source Code
At the core of this application are three classes (AwsTranscribe, AwsTranslate, AwsPolly), each responsible for interacting with one of the three services (Transcribe, Translate, and Polly) that we have utilised. Each class uses the AWS SDK for JavaScript to implement the corresponding client and API necessary for the application to transcribe, translate, and play back the speech.
Configuring new voices is straightforward but requires careful consideration, as you need to verify which voices and languages are supported in the region you are using. New Voices can be configure within the class Voices. Instead configuring a new language can be done through the LanguageSelect component.