登录查看更多内容

Amazing Gladia (Speech2text)

jesus luque

Creating digital twins for [any]verse Mediapro

发布日期: 2023年2月17日

Gladia.io: A Text Transcription Solution Based on OpenAI's Whisper Models

If you are looking for a fast, accurate and easy-to-use text transcription solution, you might want to check out Gladia.io. Gladia.io is a company that provides plug-and-play APIs to get real value from your unstructured data, starting with audio. In this blog post, we will explore how Gladia.io leverages OpenAI's Whisper models to offer a state-of-the-art text transcription service.

What are OpenAI's Whisper models?

OpenAI's Whisper models are automatic speech recognition (ASR) models that can transcribe speech into text. They are trained on 680,000 hours of multilingual data collected from the web, which makes them robust to accents, background noise and technical language. They also use a simple end-to-end approach based on Transformer architecture, which enables them to achieve high performance and low latency.

How does Gladia.io use OpenAI's Whisper models?

Gladia.io has built its own audio transcription API based on OpenAI's Whisper models. This API allows users to upload audio files or stream audio data and get back text transcripts in real time. The API supports multiple languages (99!!!!) The API also offers advanced features such as speaker diarization, punctuation, capitalization and timestamps.

What are the benefits of using Gladia.io?

Gladia.io claims that its audio transcription API can transcribe one hour of audio in 10 seconds, which is much faster than other solutions in the market. It also claims that its word error rate (WER) is as low as 1%, which means that it can produce very accurate transcripts with minimal mistakes. Moreover, Gladia.io offers a simple integration process with clear documentation and examples, which makes it easy for developers to use its API in their applications.

How can you try out Gladia.io?

If you are interested in trying out Gladia.io's audio transcription API, you can sign up for a free account on their website ?. You will get access to their dashboard where you can upload or stream audio files and see the transcripts. You will also get an API key that you can use to make requests to their API endpoint. You can find more details about how to use their API in their documentation?.

Quick tutorial

Download a video live sample:

Data & Analytics 6 个月前

AI in Language Translation: Will It Replace Human…

Analytics Insight? 1 个月前

6 Key Benefits of Transcription Technology for Your…

Daniel Abbott 1 年前


yt-dlp "https://www.youtube.com/watch?v=xR-4NDFsYNk"

2. Transcode audio track to wav:

ffmpeg -i source.mp4 -vn -acodec pcm_s16le target.wav

3. Use Gladia API, using own example:

curl -X 'POST' 
    'https://api.gladia.io/audio/text/audio-transcription/' \
    -H 'accept: application/json' \
    -H 'x-gladia-key: YOUR KEY HERE' \
    -H 'Content-Type: multipart/form-data' \
    -F "audio_url=https://files.gladia.io/example/audio-transcription/split_infinity.wav" \
    -F "language=spanish" \
    -F "language_behaviour=automatic single language"\

4. Convert output (json) to SRT using that script (json2srt.py):

import argpars
import json


def generate_srt(input_file, output_file):
    with open(input_file, encoding='utf-8') as f:
        data = json.load(f)


    srt_lines = []
    for i, prediction in enumerate(data["prediction"]):
        time_begin = prediction["time_begin"]
        time_end = prediction["time_end"]
        transcript = prediction["transcript"]
        srt_lines.append(f"{i+1}\n{format_time(time_begin)} --> {format_time(time_end)}\n{transcript}\n\n")


    with open(output_file, "w") as f:
        f.writelines(srt_lines)


def format_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds // 60) % 60)
    seconds = int(seconds % 60)
    milliseconds = int((seconds - int(seconds)) * 1000)
    return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Generate SRT file from JSON transcription")
    parser.add_argument("input_file", help="Path to input JSON file")
    parser.add_argument("output_file", help="Path to output SRT file")
    args = parser.parse_args()


    generate_srt(args.input_file, args.output_file)

5. Play in VLC (use the same filename as the video and subtitles file and it will load automatically).

Conclusion

Gladia.io is a text transcription solution based on OpenAI's Whisper models that offers fast, accurate, easy-to-use text transcription services and probably cheap!. It supports multiple languages and formats, as well as advanced features such as speaker diarization and punctuation. It also provides a simple integration process with clear documentation and examples. If you want to learn more about Gladia.io or try out their service for free, visit their website.

#speech2text #openai #gladia

Amazing Gladia (Speech2text)

jesus luque

Creating digital twins for [any]verse Mediapro

Gladia.io: A Text Transcription Solution Based on OpenAI's Whisper Models

What are OpenAI's Whisper models?

How does Gladia.io use OpenAI's Whisper models?

What are the benefits of using Gladia.io?

How can you try out Gladia.io?

Quick tutorial

领英推荐

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

Unwrapping 2023-2024 Language Industry Insights

The Future of AI and ChatGPT in Direct Translation

Language Translations and Artificial Intelligence

The AI Translation Revolution: Redefining Language Services in the Age of Artificial Intelligence

Paper Review: Translatotron 3: Speech to Speech Translation with Monolingual Data

How real time translation can overcome language and cultural barriers

Unleashing the Power of AI: How Neural Machine Translation is Revolutionizing Communication

Generative AI Tools Landscape - Audio Applications – Part3

How to Leverage Chat GTP Translation Capabilities

THE ROLE OF TRANSLATION TECHNOLOGY IN GLOBAL BUSINESS SUCCESS

Gladia.io: A Text Transcription Solution Based on OpenAI's Whisper Models

What are OpenAI's Whisper models?

How does Gladia.io use OpenAI's Whisper models?

What are the benefits of using Gladia.io?

How can you try out Gladia.io?

Quick tutorial

领英推荐

Conclusion

Pseudo (?) random numbers and openai

2023年4月10日

Workstation on Cloud. Google Cloud & Unreal 4.21

2019年2月5日

LeapMotion AR open hardware

2018年4月10日

Magic Leap Creator Portal

2018年3月19日

Empieza un nuevo proyecto ziip.es

2015年9月2日

社区洞察

其他会员也浏览了

Unwrapping 2023-2024 Language Industry Insights

The Future of AI and ChatGPT in Direct Translation

Language Translations and Artificial Intelligence

The AI Translation Revolution: Redefining Language Services in the Age of Artificial Intelligence

Paper Review: Translatotron 3: Speech to Speech Translation with Monolingual Data

How real time translation can overcome language and cultural barriers

Unleashing the Power of AI: How Neural Machine Translation is Revolutionizing Communication

Generative AI Tools Landscape - Audio Applications – Part3

How to Leverage Chat GTP Translation Capabilities

THE ROLE OF TRANSLATION TECHNOLOGY IN GLOBAL BUSINESS SUCCESS