Amazing Gladia (Speech2text)
Gladia.io: A Text Transcription Solution Based on OpenAI's Whisper Models
If you are looking for a fast, accurate and easy-to-use text transcription solution, you might want to check out Gladia.io. Gladia.io is a company that provides plug-and-play APIs to get real value from your unstructured data, starting with audio. In this blog post, we will explore how Gladia.io leverages OpenAI's Whisper models to offer a state-of-the-art text transcription service.
What are OpenAI's Whisper models?
OpenAI's Whisper models are automatic speech recognition (ASR) models that can transcribe speech into text. They are trained on 680,000 hours of multilingual data collected from the web, which makes them robust to accents, background noise and technical language. They also use a simple end-to-end approach based on Transformer architecture, which enables them to achieve high performance and low latency.
How does Gladia.io use OpenAI's Whisper models?
Gladia.io has built its own audio transcription API based on OpenAI's Whisper models. This API allows users to upload audio files or stream audio data and get back text transcripts in real time. The API supports multiple languages (99!!!!) The API also offers advanced features such as speaker diarization, punctuation, capitalization and timestamps.
What are the benefits of using Gladia.io?
Gladia.io claims that its audio transcription API can transcribe one hour of audio in 10 seconds, which is much faster than other solutions in the market. It also claims that its word error rate (WER) is as low as 1%, which means that it can produce very accurate transcripts with minimal mistakes. Moreover, Gladia.io offers a simple integration process with clear documentation and examples, which makes it easy for developers to use its API in their applications.
How can you try out Gladia.io?
If you are interested in trying out Gladia.io's audio transcription API, you can sign up for a free account on their website ?. You will get access to their dashboard where you can upload or stream audio files and see the transcripts. You will also get an API key that you can use to make requests to their API endpoint. You can find more details about how to use their API in their documentation?.
Quick tutorial
领英推荐
yt-dlp "https://www.youtube.com/watch?v=xR-4NDFsYNk"
2. Transcode audio track to wav:
ffmpeg -i source.mp4 -vn -acodec pcm_s16le target.wav
3. Use Gladia API, using own example:
curl -X 'POST'
'https://api.gladia.io/audio/text/audio-transcription/' \
-H 'accept: application/json' \
-H 'x-gladia-key: YOUR KEY HERE' \
-H 'Content-Type: multipart/form-data' \
-F "audio_url=https://files.gladia.io/example/audio-transcription/split_infinity.wav" \
-F "language=spanish" \
-F "language_behaviour=automatic single language"\
4. Convert output (json) to SRT using that script (json2srt.py):
import argpars
import json
def generate_srt(input_file, output_file):
with open(input_file, encoding='utf-8') as f:
data = json.load(f)
srt_lines = []
for i, prediction in enumerate(data["prediction"]):
time_begin = prediction["time_begin"]
time_end = prediction["time_end"]
transcript = prediction["transcript"]
srt_lines.append(f"{i+1}\n{format_time(time_begin)} --> {format_time(time_end)}\n{transcript}\n\n")
with open(output_file, "w") as f:
f.writelines(srt_lines)
def format_time(seconds):
hours = int(seconds // 3600)
minutes = int((seconds // 60) % 60)
seconds = int(seconds % 60)
milliseconds = int((seconds - int(seconds)) * 1000)
return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Generate SRT file from JSON transcription")
parser.add_argument("input_file", help="Path to input JSON file")
parser.add_argument("output_file", help="Path to output SRT file")
args = parser.parse_args()
generate_srt(args.input_file, args.output_file)
5. Play in VLC (use the same filename as the video and subtitles file and it will load automatically).
Conclusion
Gladia.io is a text transcription solution based on OpenAI's Whisper models that offers fast, accurate, easy-to-use text transcription services and probably cheap!. It supports multiple languages and formats, as well as advanced features such as speaker diarization and punctuation. It also provides a simple integration process with clear documentation and examples. If you want to learn more about Gladia.io or try out their service for free, visit their website.
Wow Jesus Luque, we're so glad you liked our API, and what an amazing tutorial!! Thank you :)