Building a Real-Time Speech Translator Using Amazon's AI Services

Building a Real-Time Speech Translator Using Amazon's AI Services

Throughout this article, we will explore the development of a real-time speech translation application built using Amazon's AI services—Transcribe, Translate, and Polly. This application captures speech from a microphone, translates it into another language, and outputs the translation both as real-time audio and as text displayed on the screen.

Here is the high-level architectural diagram of the application.

You can clone the application's repository using the link below.

https://github.com/alchimya/translate-my-voice.git

How The Application Works

The process is simple yet sophisticated, with three Amazon Web Services tools working in perfect harmony:

  1. Amazon Transcribe: When you speak into your microphone, Amazon Transcribe kicks into action. It captures your spoken words and converts them into accurate text. This transcription happens in real time, ensuring that even long or complex conversations are translated without delay.
  2. Amazon Translate: Once your speech has been transcribed, it’s passed through Amazon Translate. This service translates the text into the target language you’ve selected.
  3. Amazon Polly: Finally, the translated text is fed into Amazon Polly, which converts the text back into spoken words using natural-sounding voices. The result? Your message is spoken out loud in another language, just as if you were fluent in it yourself!

All of this happens in real time, allowing you to hear your spoken words translated into the target language instantly, using either a male or female voice.

Configuring The App and Setting Up The AWS Environment

To get started with the app, you'll need to set up your AWS environment by creating a Cognito Identity Pool and configuring its IAM role. This will grant the app access to AWS services like Transcribe, Translate, and Polly.

Please note that we will create an Identity Pool with unauthenticated access enabled, which is recommended only for prototyping purposes. For production-ready applications, it is advised to use a secure authentication flow, either by implementing a Cognito User Pool or by configuring the Identity Pool to issue credentials to authenticated users from trusted identity providers.

Open the Amazon Cognito console, and create a new identity pool choosing the option Guest access.

Go to the next steps to configure permissions and properties. In our example we have used TranslateMyVoiceIdentityPoolRole and TranslateMyVoiceIdentityPool for the IAM Role and name of the identity pool. After creating the Identity Pool, open it and copy the Identity Pool Id (this will be used later to configure the React Client App).

Now navigate to IAM, click on Roles and search for TranslateMyVoiceIdentityPoolRole. You will need to add policies that allow your app to interact with the AWS services.

Select now the role, and then click on the Create inline policy menu.

We need to add the following policies.

  1. Transcribe: StartStreamTranscriptionWebSocket
  2. Translate: TranslateText
  3. Polly: SynthesizeSpeech

For simplicity I'm attaching below the three policies in JSON format, you need to add to the role

TranslateMyVoiceTranscribePolicy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "transcribe:StartStreamTranscriptionWebSocket",
            "Resource": "*"
        }
    ]
}        

TranslateMyVoiceTranslatePolicy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "translate:TranslateText",
            "Resource": "*"
        }
    ]
}        

TranslateMyVoicePollyPolicy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "polly:SynthesizeSpeech",
            "Resource": "*"
        }
    ]
}        

This is how should look like the TranslateMyVoiceIdentityPoolRole after creating the three policies:

We are not ready to configure and start using the app!

Clone the repository https://github.com/alchimya/translate-my-voice.git

Install all the dependencies with the command npm install

Now open the .env file and configure the two environment variables

  • REACT_APP_REGION: is the region where you have created the cognito identity pool.
  • REACT_APP_IDENTITY_POOL_ID: is the id of your cognito identity pool.

Now launching the command npm start you should be able to run the app.

How To Use The App

Using the app is straightforward. Simply select the language you'll be speaking, the language you want your speech translated into, and choose whether you'd like to hear the translation in a male or female voice, and click on the Start Speech button.

The Source Code

At the core of this application are three classes (AwsTranscribe, AwsTranslate, AwsPolly), each responsible for interacting with one of the three services (Transcribe, Translate, and Polly) that we have utilised. Each class uses the AWS SDK for JavaScript to implement the corresponding client and API necessary for the application to transcribe, translate, and play back the speech.

Configuring new voices is straightforward but requires careful consideration, as you need to verify which voices and languages are supported in the region you are using. New Voices can be configure within the class Voices. Instead configuring a new language can be done through the LanguageSelect component.



要查看或添加评论,请登录

Domenico Vacchiano的更多文章

社区洞察

其他会员也浏览了