登录查看更多内容

Building a Real-Time Speech Translator Using Amazon's AI Services

Domenico Vacchiano

Co-Founder at Cloud Crafter

发布日期: 2024年9月29日

Throughout this article, we will explore the development of a real-time speech translation application built using Amazon's AI services—Transcribe, Translate, and Polly. This application captures speech from a microphone, translates it into another language, and outputs the translation both as real-time audio and as text displayed on the screen.

Here is the high-level architectural diagram of the application.

You can clone the application's repository using the link below.

https://github.com/alchimya/translate-my-voice.git

How The Application Works

The process is simple yet sophisticated, with three Amazon Web Services tools working in perfect harmony:

Amazon Transcribe: When you speak into your microphone, Amazon Transcribe kicks into action. It captures your spoken words and converts them into accurate text. This transcription happens in real time, ensuring that even long or complex conversations are translated without delay.
Amazon Translate: Once your speech has been transcribed, it’s passed through Amazon Translate. This service translates the text into the target language you’ve selected.
Amazon Polly: Finally, the translated text is fed into Amazon Polly, which converts the text back into spoken words using natural-sounding voices. The result? Your message is spoken out loud in another language, just as if you were fluent in it yourself!

All of this happens in real time, allowing you to hear your spoken words translated into the target language instantly, using either a male or female voice.

Configuring The App and Setting Up The AWS Environment

To get started with the app, you'll need to set up your AWS environment by creating a Cognito Identity Pool and configuring its IAM role. This will grant the app access to AWS services like Transcribe, Translate, and Polly.

Please note that we will create an Identity Pool with unauthenticated access enabled, which is recommended only for prototyping purposes. For production-ready applications, it is advised to use a secure authentication flow, either by implementing a Cognito User Pool or by configuring the Identity Pool to issue credentials to authenticated users from trusted identity providers.

Open the Amazon Cognito console, and create a new identity pool choosing the option Guest access.

Go to the next steps to configure permissions and properties. In our example we have used TranslateMyVoiceIdentityPoolRole and TranslateMyVoiceIdentityPool for the IAM Role and name of the identity pool. After creating the Identity Pool, open it and copy the Identity Pool Id (this will be used later to configure the React Client App).

Now navigate to IAM, click on Roles and search for TranslateMyVoiceIdentityPoolRole. You will need to add policies that allow your app to interact with the AWS services.

Select now the role, and then click on the Create inline policy menu.

We need to add the following policies.

Transcribe: StartStreamTranscriptionWebSocket
Translate: TranslateText
Polly: SynthesizeSpeech

For simplicity I'm attaching below the three policies in JSON format, you need to add to the role

领英推荐

From MT to Translation AI: Our Journey to Lara

Translated 4 个月前

AI in Language Translation: Will It Replace Human…

Analytics Insight? 5 个月前

Words for Sale: The Cost of AI's Domination in the…

Wojciech Woloszyk (Wo?oszyk) 1 年前

TranslateMyVoiceTranscribePolicy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "transcribe:StartStreamTranscriptionWebSocket",
            "Resource": "*"
        }
    ]
}

TranslateMyVoiceTranslatePolicy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "translate:TranslateText",
            "Resource": "*"
        }
    ]
}

TranslateMyVoicePollyPolicy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "polly:SynthesizeSpeech",
            "Resource": "*"
        }
    ]
}

This is how should look like the TranslateMyVoiceIdentityPoolRole after creating the three policies:

We are not ready to configure and start using the app!

Clone the repository https://github.com/alchimya/translate-my-voice.git

Install all the dependencies with the command npm install

Now open the .env file and configure the two environment variables

REACT_APP_REGION: is the region where you have created the cognito identity pool.
REACT_APP_IDENTITY_POOL_ID: is the id of your cognito identity pool.

Now launching the command npm start you should be able to run the app.

How To Use The App

Using the app is straightforward. Simply select the language you'll be speaking, the language you want your speech translated into, and choose whether you'd like to hear the translation in a male or female voice, and click on the Start Speech button.

The Source Code

At the core of this application are three classes (AwsTranscribe, AwsTranslate, AwsPolly), each responsible for interacting with one of the three services (Transcribe, Translate, and Polly) that we have utilised. Each class uses the AWS SDK for JavaScript to implement the corresponding client and API necessary for the application to transcribe, translate, and play back the speech.

Configuring new voices is straightforward but requires careful consideration, as you need to verify which voices and languages are supported in the region you are using. New Voices can be configure within the class Voices. Instead configuring a new language can be done through the LanguageSelect component.

要查看或添加评论，请登录

Domenico Vacchiano的更多文章

Simplify Exploratory Data Analysis and Data Cleaning With Multi-Agent Systems.

2025年1月17日

Simplify Exploratory Data Analysis and Data Cleaning With Multi-Agent Systems.

Introduction A multi-agent system consists of a collection of intelligent agents, each tasked with a specific role…
Multi-Agent Systems: Automating Infrastructure as Code Generation from Architecture Diagrams

2025年1月11日

Multi-Agent Systems: Automating Infrastructure as Code Generation from Architecture Diagrams

Introduction Imagine having a solution where your architecture diagrams seamlessly transform into structured and…
Building a Real-Time Player Bonus Reward System Using Neural Networks

2024年6月1日

Building a Real-Time Player Bonus Reward System Using Neural Networks

Introduction In the competitive world of online gambling, player retention and engagement are critical for business…
Clustering Players Game Transactions with Amazon SageMaker

2024年5月28日

Clustering Players Game Transactions with Amazon SageMaker

In this article, we will explore a practical application of K-Means clustering combined with Principal Component…
Build a Semantic Search Engine Using Sentence Transformers

2024年5月18日

Build a Semantic Search Engine Using Sentence Transformers

Introduction In today's data-driven world, the ability to quickly and accurately search through vast amounts of text it…
Train a Model with Neural Networks, for Responsible Gaming Predictions and Monitoring

2024年4月24日

Train a Model with Neural Networks, for Responsible Gaming Predictions and Monitoring

Introduction A while back, I came across an intriguing article on the AWS Machine Learning Blog that captured my…
Engineering Team Spotlight

2021年2月22日

Engineering Team Spotlight

After more than 20 years working in tech as a developer, an architect and a technology manager, in January 2021 I…
API Composition Pattern with GraphQL

2020年4月2日

API Composition Pattern with GraphQL

Introduction When you decide to embrace a microservices architecture, you need to be prepared to face several…
Distributed Tracing: Instrumenting and tracing NodeJs microservices with Zipkin

2020年2月8日

Distributed Tracing: Instrumenting and tracing NodeJs microservices with Zipkin

Introduction In a microservices architecture a single application, performing one or more operations, can trigger a…
Fn Project & Node.Js: playing with a wheel of fortune!

2019年12月14日

Fn Project & Node.Js: playing with a wheel of fortune!

Introduction Almost a year ago, I published another article (https://www.linkedin.

See all articles

Building a Real-Time Speech Translator Using Amazon's AI Services

Domenico Vacchiano

Co-Founder at Cloud Crafter

How The Application Works

Configuring The App and Setting Up The AWS Environment

领英推荐

How To Use The App

The Source Code

Domenico Vacchiano的更多文章

社区洞察

其他会员也浏览了

2025 AI Predictions: AI Frequently Outperforms Humans in Language-Related Work and Processes

NMT vs. LLM: Which Translation Technology Suits Your Needs?

No Language Left Behind

Top RAG Papers of the Week (September Week 3, 2024)

What’s new in September? Lokalise AI on steroids! Now you can automate AI translations too.

Can we trust LLMs with translations?

Amazing Translation Technology You Should Look Forward to Using

Language Tech through Time: A Lookback at the Linguist’s Landscape

Paper Review: Translatotron 3: Speech to Speech Translation with Monolingual Data

How real time translation can overcome language and cultural barriers

How The Application Works

Configuring The App and Setting Up The AWS Environment

领英推荐

How To Use The App

The Source Code

Domenico Vacchiano的更多文章

Simplify Exploratory Data Analysis and Data Cleaning With Multi-Agent Systems.

Multi-Agent Systems: Automating Infrastructure as Code Generation from Architecture Diagrams

Building a Real-Time Player Bonus Reward System Using Neural Networks

Clustering Players Game Transactions with Amazon SageMaker

Build a Semantic Search Engine Using Sentence Transformers

Train a Model with Neural Networks, for Responsible Gaming Predictions and Monitoring

Engineering Team Spotlight

API Composition Pattern with GraphQL

Distributed Tracing: Instrumenting and tracing NodeJs microservices with Zipkin

Fn Project & Node.Js: playing with a wheel of fortune!

社区洞察

其他会员也浏览了

2025 AI Predictions: AI Frequently Outperforms Humans in Language-Related Work and Processes

NMT vs. LLM: Which Translation Technology Suits Your Needs?

No Language Left Behind

Top RAG Papers of the Week (September Week 3, 2024)

What’s new in September? Lokalise AI on steroids! Now you can automate AI translations too.

Can we trust LLMs with translations?

Amazing Translation Technology You Should Look Forward to Using

Language Tech through Time: A Lookback at the Linguist’s Landscape

Paper Review: Translatotron 3: Speech to Speech Translation with Monolingual Data

How real time translation can overcome language and cultural barriers