STJ, YAWT, and other LNPs (Late Night Projects)

STJ, YAWT, and other LNPs (Late Night Projects)

It’s the holidays season, and with some free time, I decided to go ahead and create the transcription tool I always wanted. Not quite there yet, but as first step, I created YAWT, which stands for “Yet Another Whisper Tool” (yes, I know, a bit on the nose).

YAWT handles transcription of audio or video (e.g. zoom calls) into text, recognizes different speakers, and handles multiple languages seamlessly — super useful for anyone mixing Hebrew and English (aka Hibrish), which is pretty much everyone in Israeli tech.

It uses OpenAI’s Whisper large-v3 for transcription and PyAnnote. The output can be generated in multiple useful text formats like SRT, VTT, and a rich JSON format designed downstream apps (hello, LLMs!).

While YAWT is specifically designed to solve some of my own pains, hopefully it’s flexible enough to solve other people’s pains as well. You can check it out here: YAWT on GitHub .


But here is the thing: as I was playing with the JSON output from YAWT, I thought, “Surely, there must be a standard JSON format for transcriptions that I can leverage and extend, right?” Nope. I found… nothing.

It was late — 12am, to be exact — and I probably should’ve gone to bed. Instead, three hours later, the Standard Transcription JSON ( STJ ) Format , was born:

  • A comprehensive JSON-based spec for transcription, subtitles, translations, and more.
  • It’s a superset, pulling features from SRT, VTT, TTML, SSA/ASS, and beyond
  • Includes tools to convert STJ into these formats.
  • Comes with validators, sample code — you name it.

Huge shoutout to OpenAI o1-preview, Claude Sonnet 3.5, Cursor and Perplexity — these tools made the process a breeze. Truly a brave new world we’re living in.

This is not actually my desk; my real one is a huge mess.

If you’re interested in STJ, Getting Started is a good place to, well, start. I hope someone out there will find it useful — and even if not, it was a fun way to spend a few late-night hours. Let me know what you think!


Next, I’ll be feeding YAWT’s STJ output into another tool that extracts relevant context from related documents and uses it to enhance transcriptions. But that’s for another late-night session.

Luke M.

Machine Learning Engineer

2 周

Wow, I'm surprised that there was not a standard JSON format for transcriptions. That's great that you were able to put this together to create a comprehensive JSON-based spec for transcription, subtitles, translations and other stuff!

回复

LNPs often are the best ones

回复
Maya Azoulay

Partner at lool ventures

1 个月

KING! Problem to opportunity to value ??

习移

Founder & CEO MindLi - PRIME Your AI Thinking || Vising Prof. MBA Technion | Harvard Alum

1 个月

?? ?? ????? ???? ????? :-)

Michael Lugassy

Principal Engineer at Forter

1 个月

very nice, you seem to have covered most use cases. i wish many consumers and producers would use this. many of the word-by-word caption tools expect a word timestamp, will that work within the same transcript segments?

要查看或添加评论,请登录

Yaniv Golan的更多文章

社区洞察

其他会员也浏览了