登录查看更多内容

A Real-Time Speech Translator

Ajay Kumar Barun

Senior Technical Specialist – Data & AI at Microsoft | Expert in Cloud-Native Architecture, Presales, Hybrid Solutions, Generative AI, Data & Database Technologies

发布日期: 2024年7月7日

I'm excited to share my latest project: a Speech Translator Application that leverages Tkinter and Azure Cognitive Services for real-time translation. This application aims to break down language barriers by providing instant translations and speech synthesis in multiple languages, including Hindi, English, Tamil, and Telugu.

Key Features:

Real-Time Speech Recognition: Seamlessly transcribe spoken words and translate them into multiple languages simultaneously.
Multiple Language Support: Currently supports Hindi, English, Tamil, and Telugu with easy toggling options.
Speech Synthesis: Not only translates but also speaks the translated text using natural-sounding voices.
User-Friendly Interface: Built with Tkinter, it offers an intuitive and responsive UI.
Customizable: Users can select which languages they want to translate to and adjust settings to suit their needs.

This project is a testament to the powerful capabilities of Azure Cognitive Services and demonstrates how we can leverage cloud-based AI to create innovative solutions that enhance communication and inclusivity.

Here's a breakdown of the sample code into steps, explaining each part:

Step 1: Import Necessary Libraries

import tkinter as tk
from tkinter import messagebox
import azure.cognitiveservices.speech as speechsdk
import threading

Import tkinter for creating the graphical user interface.
Import messagebox for displaying message boxes.
Import azure.cognitiveservices.speech for speech translation and synthesis.
Import threading for handling asynchronous tasks.

Step 2: Initialize the Main Application Class


class SpeechTranslatorApp:
    def __init__(self, root):
        self.root = root
        self.root.title("Speech Translator")
        self.root.geometry("800x600")
        self.root.configure(bg="#2c3e50")
        self.is_listening = False
        self.create_menu()
        self.setup_ui()
        self.setup_translation_service()

Define a class SpeechTranslatorApp to encapsulate the application.
Initialize the main window (root) and set its properties.
Set a flag is_listening to track if the application is listening.
Call methods to create the menu, setup the UI, and setup the translation service.

Step 3: Create the Menu

    def create_menu(self):
        menu_bar = tk.Menu(self.root)
        self.root.config(menu=menu_bar)
        file_menu = tk.Menu(menu_bar, tearoff=0)
        menu_bar.add_cascade(label="File", menu=file_menu)
        file_menu.add_command(label="Exit", command=self.exit_program)

Create a menu bar and add a "File" menu with an "Exit" option.

Step 4: Exit Program Method

    def exit_program(self):
        self.root.quit()

Define a method to quit the application.

Step 5: Setup the User Interface (UI)

    def setup_ui(self):
        self.canvas = tk.Canvas(root, bg="#2c3e50")
        self.canvas.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
        self.scrollbar = tk.Scrollbar(root, command=self.canvas.yview)
        self.scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
        self.canvas.configure(yscrollcommand=self.scrollbar.set)
        self.main_frame = tk.Frame(self.canvas, bg="#2c3e50")
        self.canvas.create_window((0, 0), window=self.main_frame, anchor="nw")
        self.main_frame.bind("<Configure>", self.on_frame_configure)
        self.create_header()
        self.create_controls()
        self.create_language_selection()
        self.create_output_display()
        self.create_translation_display()

Setup the main UI components, including a canvas for scrolling, frames for layout, and methods for creating different sections of the UI.

Step 6: Create Header

    def create_header(self):
        self.header_frame = tk.Frame(self.main_frame, bg="#34495e", pady=10)
        self.header_frame.pack(fill=tk.X)
        self.title_label = tk.Label(self.header_frame, text="Speech Translator", font=("Helvetica", 16, "bold"), fg="white", bg="#34495e")
        self.title_label.pack()

Create a header section with a title label.

Step 7: Create Controls

    def create_controls(self):
        self.control_frame = tk.Frame(self.main_frame, bg="#2c3e50", pady=20)
        self.control_frame.pack(fill=tk.X)
        self.start_button = tk.Button(self.control_frame, text="Start Listening", font=("Helvetica", 12), bg="#1abc9c", fg="white", command=self.toggle_listening)
        self.start_button.pack(pady=10)

Create a control section with a start/stop listening button.

Step 8: Create Language Selection

    def create_language_selection(self):
        self.language_frame = tk.Frame(self.main_frame, bg="#2c3e50", pady=10)
        self.language_frame.pack(fill=tk.X, padx=20)
        self.languages = {"hi": tk.BooleanVar(value=True), "en": tk.BooleanVar(value=True), "ta": tk.BooleanVar(value=True), "te": tk.BooleanVar(value=True)}
        for lang in self.languages:
            checkbox = tk.Checkbutton(self.language_frame, text=lang, variable=self.languages[lang], font=("Helvetica", 12), fg="white", bg="#2c3e50", selectcolor="#2c3e50", activebackground="#2c3e50", activeforeground="white")
            checkbox.pack(side=tk.LEFT, padx=10)

Create a section for selecting languages using checkboxes.

Step 9: Create Output Display

    def create_output_display(self):
        self.output_frame = tk.Frame(self.main_frame, bg="#2c3e50", pady=10)
        self.output_frame.pack(fill=tk.BOTH, expand=True, padx=20)
        self.output_label = tk.Label(self.output_frame, text="Recognized Speech:", font=("Helvetica", 14), fg="white", bg="#2c3e50")
        self.output_label.pack(anchor="w")
        self.output_text_frame = tk.Frame(self.output_frame)
        self.output_text_frame.pack(fill=tk.BOTH, expand=True)
        self.output_scrollbar = tk.Scrollbar(self.output_text_frame)
        self.output_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
        self.output_text = tk.Text(self.output_text_frame, height=10, width=50, yscrollcommand=self.output_scrollbar.set, font=("Helvetica", 12), wrap=tk.WORD, bg="#ecf0f1")
        self.output_text.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
        self.output_scrollbar.config(command=self.output_text.yview)

Create a section to display the recognized speech.

Step 10: Create Translation Display

领英推荐

Translator's Toolbox: Edition January 2025 Global Lingo

Global Lingo 1 个月前

The last untranslatable topics

Gengo 10 个月前

Beyond Words: How Human Translators Capture Nuance and…

Amara Translation and Subtitling 1 个月前

    def create_translation_display(self):
        self.translated_frame = tk.Frame(self.main_frame, bg="#2c3e50", pady=10)
        self.translated_frame.pack(fill=tk.BOTH, expand=True, padx=20)
        self.translated_frames = {}
        self.target_languages = ["hi", "en", "ta", "te"]
        for lang in self.target_languages:
            frame = tk.Frame(self.translated_frame, bg="#2c3e50", pady=5)
            frame.pack(fill=tk.BOTH, expand=True)
            label = tk.Label(frame, text=f"Translated into {lang}:", font=("Helvetica", 14), fg="white", bg="#2c3e50")
            label.pack(anchor="w", pady=2)
            text_scroll_frame = tk.Frame(frame)
            text_scroll_frame.pack(fill=tk.BOTH, expand=True)
            text_scrollbar = tk.Scrollbar(text_scroll_frame)
            text_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
            text_box = tk.Text(text_scroll_frame, height=5, width=50, yscrollcommand=text_scrollbar.set, font=("Helvetica", 12), wrap=tk.WORD, bg="#ecf0f1")
            text_box.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
            text_scrollbar.config(command=text_box.yview)
            button_frame = tk.Frame(frame, bg="#2c3e50")
            button_frame.pack(fill=tk.X, pady=5)
            speak_button = tk.Button(button_frame, text="Speak", font=("Helvetica", 12), bg="#3498db", fg="white", command=lambda l=lang: self.speak_translation(l))
            speak_button.pack(side=tk.LEFT, padx=10)
            play_button = tk.Button(button_frame, text="Play Latest", font=("Helvetica", 12), bg="#e67e22", fg="white", command=lambda l=lang: self.play_latest_translation(l))
            play_button.pack(side=tk.LEFT, padx=10)
            self.translated_frames[lang] = {"text_box": text_box, "latest_translation": ""}

Create sections to display translations for each target language with options to speak or play the latest translation

Step 11: Setup Translation Service

    def setup_translation_service(self):
        self.speech_translation_config = speechsdk.translation.SpeechTranslationConfig(subscription='YOUR_SUBSCRIPTION_KEY', region='YOUR_REGION')
        self.speech_translation_config.speech_recognition_language = "en-US"
        for lang in self.target_languages:
            self.speech_translation_config.add_target_language(lang)
        self.audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
        self.translation_recognizer = speechsdk.translation.TranslationRecognizer(translation_config=self.speech_translation_config, audio_config=self.audio_config)
        self.translation_recognizer.recognized.connect(self.on_recognized)

Setup the Azure Cognitive Services speech translation configuration and recognizer.

Step 12: Toggle Listening Method

    def toggle_listening(self):
        if not self.is_listening:
            self.start_button.config(text="Stop Listening", bg="#e74c3c")
            self.is_listening = True
            threading.Thread(target=self.recognize_continuous).start()
        else:
            self.start_button.config(text="Start Listening", bg="#1abc9c")
            self.is_listening = False
            self.translation_recognizer.stop_continuous_recognition_async()

Define a method to start/stop listening and handle the button text and color changes.

Step 13: Recognize Continuous Method

    def recognize_continuous(self):
        self.translation_recognizer.start_continuous_recognition_async().get()

Define a method to start continuous speech recognition in a separate thread.

Step 14: Handle Recognized Speech

    def on_recognized(self, evt):
        if evt.result.reason == speechsdk.ResultReason.TranslatedSpeech:
            self.output_text.insert(tk.END, f"Recognized: {evt.result.text}\n")
            for lang in self.target_languages:
                if self.languages[lang].get():
                    translation = evt.result.translations.get(lang, "")
                    self.translated_frames[lang]["text_box"].insert(tk.END, f"{translation}\n")
                    self.translated_frames[lang]["latest_translation"] = translation
                    self.speak_translation_text(lang, translation)
        elif evt.result.reason == speechsdk.ResultReason.NoMatch:
            self.output_text.insert(tk.END, "No speech could be recognized.\n")
        elif evt.result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = evt.result.cancellation_details
            self.output_text.insert(tk.END, f"Speech Recognition canceled: {cancellation_details.reason}\n")
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                self.output_text.insert(tk.END, f"Error details: {cancellation_details.error_details}\n")
                messagebox.showerror("Error", "Did you set the speech resource key and region values?")

Define a method to handle recognized speech events and display the recognized text and translations.

Step 15: Speak Translation Text Method

    def speak_translation_text(self, lang, text):
        if text:
            tts_config = speechsdk.SpeechConfig(subscription='YOUR_SUBSCRIPTION_KEY', region='YOUR_REGION')
            audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
            synthesizer = speechsdk.SpeechSynthesizer(speech_config=tts_config, audio_config=audio_config)
            if lang == "hi":
                tts_config.speech_synthesis_voice_name = "hi-IN-MadhurNeural"
            elif lang == "ta":
                tts_config.speech_synthesis_voice_name = "ta-IN-PallaviNeural"
            elif lang == "te":
                tts_config.speech_synthesis_voice_name = "te-IN-MohanNeural"
            elif lang == "en":
                tts_config.speech_synthesis_voice_name = "en-US-JennyNeural"
            synthesizer.speak_text_async(text)

Define a method to speak the translated text using Azure Cognitive Services Text-to-Speech.

Step 16: Speak Translation Method

    def speak_translation(self, lang):
        text = self.translated_frames[lang]["text_box"].get(1.0, tk.END).strip()
        self.speak_translation_text(lang, text)

Define a method to speak the translation from the text box.

Step 17: Play Latest Translation Method

    def play_latest_translation(self, lang):
        text = self.translated_frames[lang]["latest_translation"]
        self.speak_translation_text(lang, text)

Define a method to play the latest translation.

Step 18: Handle Frame Configuration for Scrolling

    def on_frame_configure(self, event):
        self.canvas.configure(scrollregion=self.canvas.bbox("all"))

Define a method to configure the scroll region of the canvas.

Step 19: Run the Application

if __name__ == "__main__":
    root = tk.Tk()
    app = SpeechTranslatorApp(root)
    root.mainloop()

Initialize and run the Tkinter application.

I'm looking forward to any feedback and suggestions from the community. Feel free to reach out if you have any questions or want to know more about the implementation details.

#TechInnovation #AI #AzureCognitiveServices #SpeechRecognition #LanguageTranslation #Python #Tkinter #InnovationInTech #InclusivityInTech

RAMDAS undefined

7 个月

hi sir, we need to use teh traslator app for spiritual books translate , so how to use this can u sharee app details prices

Adhip Ray

Startups Need Rapid Growth, Not Just Digital Impressions. We Help Create Omni-Channel Digital Strategies for Real Business Growth.

7 个月

Wow, this sounds fantastic! A real-time Speech Translator app using Tkinter and Azure Cognitive Services is truly innovative. It's amazing to see how technology can break down language barriers so effectively. I'm keen to explore how your app handles real-time speech recognition and translations across multiple languages. Keep pushing the boundaries of tech innovation!

2 次回应

Alexander K.

Experienced & Ambitious | Energetic & Passionate | Diverse Background | ex-AWS

7 个月

I absolutely love this project Ajay! I look forward to tinkering with your application and understanding the AI involved, especially in the topic of language translation. Congratulations!

2 次回应

Manjunath S

Solution Architect @ Microsoft | Big Data, Advanced Analytics

7 个月

Very informative

2 次回应

查看更多评论

要查看或添加评论，请登录

Ajay Kumar Barun的更多文章

Pioneering Secure & Intelligent Applications: A Reference Architecture for the Modern AI Era

2025年1月11日

Pioneering Secure & Intelligent Applications: A Reference Architecture for the Modern AI Era

Introduction Welcome to the age of digital transformation, where security, scalability, and intelligence drive…

2 条评论
Empowering Data Analysis with Azure OpenAI and SQL: Transforming Questions into Insights

2024年6月23日

Empowering Data Analysis with Azure OpenAI and SQL: Transforming Questions into Insights

In today's data-driven world, the ability to quickly transform questions into actionable insights is paramount…

3 条评论
Harnessing the Power of Azure Cosmos DB as a Vector Database

2024年6月6日

Harnessing the Power of Azure Cosmos DB as a Vector Database

In the realm of artificial intelligence and machine learning, vector databases have emerged as a vital component for…

1 条评论
Revolutionize Your Data Insights: Building Visualizations with Azure OpenAI Assistants (Preview)

2024年6月2日

Revolutionize Your Data Insights: Building Visualizations with Azure OpenAI Assistants (Preview)

The Azure OpenAI Assistant API empowers developers to create intelligent, customizable AI assistants. Key capabilities…
LLMOps Using Azure AI Studio: Streamlining Large Language Model Operations

2024年6月1日

LLMOps Using Azure AI Studio: Streamlining Large Language Model Operations

In the rapidly evolving landscape of artificial intelligence, Large Language Models (LLMs) have emerged as powerful…
Empowering AI Solutions with Azure AI Offline Containers in Disconnected Environments

2024年5月18日

Empowering AI Solutions with Azure AI Offline Containers in Disconnected Environments

Importance of Azure AI Offline Container in Disconnected Environments Azure AI Offline Containers are crucial for…
Building a Document Comparison Web App with Flask, Azure AI Document Intelligence, and OpenAI

2024年5月5日

Building a Document Comparison Web App with Flask, Azure AI Document Intelligence, and OpenAI

This article demonstrates how to build a web application for comparing the text content of documents using Flask, Azure…

3 条评论
Implementing Image Search Using Microsoft Azure: A Guide to Modern Solutions

2024年4月24日

Implementing Image Search Using Microsoft Azure: A Guide to Modern Solutions

Implementing Image Search Using Microsoft Azure: A Guide to Modern Solutions In today's digital age, the ability to…
Fostering AI-ready culture: Data-driven, inclusive, responsible, and leadership-driven transformation.

2023年10月4日

Fostering AI-ready culture: Data-driven, inclusive, responsible, and leadership-driven transformation.

Fostering an AI-ready culture requires: Being a data-driven organization. Empowering people to participate in the AI…
Challenges and Considerations of Adopting a Cloud-Agnostic Data Platform

2023年9月28日

Challenges and Considerations of Adopting a Cloud-Agnostic Data Platform

A cloud-agnostic data platform refers to a data infrastructure and architecture that is designed to work across…

See all articles

A Real-Time Speech Translator

Ajay Kumar Barun

Senior Technical Specialist – Data & AI at Microsoft | Expert in Cloud-Native Architecture, Presales, Hybrid Solutions, Generative AI, Data & Database Technologies

Key Features:

领英推荐

Ajay Kumar Barun的更多文章

社区洞察

其他会员也浏览了

AI in Transforming Translation Services

Improving GPT-4 for Translation?

Microsoft Adds 13 New African Languages On Its Translation Service

What is Transliteration, and Why is it Everywhere?

Language Translation Devices: The Future of Communication?

Global Language Translation Software Market Trends | Market Set to Grow at a CAGR of 16.44% by 2029

How I Translated a Novel with ChatGPT

DeepL: AI-Based Translator

Best Language Translator Device 2025

Are language professionals doomed?

Key Features:

领英推荐

Ajay Kumar Barun的更多文章

Pioneering Secure & Intelligent Applications: A Reference Architecture for the Modern AI Era

Empowering Data Analysis with Azure OpenAI and SQL: Transforming Questions into Insights

Harnessing the Power of Azure Cosmos DB as a Vector Database

Revolutionize Your Data Insights: Building Visualizations with Azure OpenAI Assistants (Preview)

LLMOps Using Azure AI Studio: Streamlining Large Language Model Operations

Empowering AI Solutions with Azure AI Offline Containers in Disconnected Environments

Building a Document Comparison Web App with Flask, Azure AI Document Intelligence, and OpenAI

Implementing Image Search Using Microsoft Azure: A Guide to Modern Solutions

Fostering AI-ready culture: Data-driven, inclusive, responsible, and leadership-driven transformation.

Challenges and Considerations of Adopting a Cloud-Agnostic Data Platform

社区洞察

其他会员也浏览了

AI in Transforming Translation Services

Improving GPT-4 for Translation?

Microsoft Adds 13 New African Languages On Its Translation Service

What is Transliteration, and Why is it Everywhere?

Language Translation Devices: The Future of Communication?

Global Language Translation Software Market Trends | Market Set to Grow at a CAGR of 16.44% by 2029

How I Translated a Novel with ChatGPT

DeepL: AI-Based Translator

Best Language Translator Device 2025

Are language professionals doomed?