STJ, YAWT, and other LNPs (Late Night Projects)
It’s the holidays season, and with some free time, I decided to go ahead and create the transcription tool I always wanted. Not quite there yet, but as first step, I created YAWT, which stands for “Yet Another Whisper Tool” (yes, I know, a bit on the nose).
YAWT handles transcription of audio or video (e.g. zoom calls) into text, recognizes different speakers, and handles multiple languages seamlessly — super useful for anyone mixing Hebrew and English (aka Hibrish), which is pretty much everyone in Israeli tech.
It uses OpenAI’s Whisper large-v3 for transcription and PyAnnote. The output can be generated in multiple useful text formats like SRT, VTT, and a rich JSON format designed downstream apps (hello, LLMs!).
While YAWT is specifically designed to solve some of my own pains, hopefully it’s flexible enough to solve other people’s pains as well. You can check it out here: YAWT on GitHub .
But here is the thing: as I was playing with the JSON output from YAWT, I thought, “Surely, there must be a standard JSON format for transcriptions that I can leverage and extend, right?” Nope. I found… nothing.
It was late — 12am, to be exact — and I probably should’ve gone to bed. Instead, three hours later, the Standard Transcription JSON ( STJ ) Format , was born:
领英推荐
Huge shoutout to OpenAI o1-preview, Claude Sonnet 3.5, Cursor and Perplexity — these tools made the process a breeze. Truly a brave new world we’re living in.
If you’re interested in STJ, Getting Started is a good place to, well, start. I hope someone out there will find it useful — and even if not, it was a fun way to spend a few late-night hours. Let me know what you think!
Next, I’ll be feeding YAWT’s STJ output into another tool that extracts relevant context from related documents and uses it to enhance transcriptions. But that’s for another late-night session.
Machine Learning Engineer
2 周Wow, I'm surprised that there was not a standard JSON format for transcriptions. That's great that you were able to put this together to create a comprehensive JSON-based spec for transcription, subtitles, translations and other stuff!
前数码公司老板
3 周LNPs often are the best ones
Partner at lool ventures
1 个月KING! Problem to opportunity to value ??
Founder & CEO MindLi - PRIME Your AI Thinking || Vising Prof. MBA Technion | Harvard Alum
1 个月?? ?? ????? ???? ????? :-)
Principal Engineer at Forter
1 个月very nice, you seem to have covered most use cases. i wish many consumers and producers would use this. many of the word-by-word caption tools expect a word timestamp, will that work within the same transcript segments?