Let's goo!?Rev just dropped a Whisper killer! Open weight Speech transcription & Diarization model beating the current SoTA! ??
Bonus: Model weights on the ?? Hub
> Reverb ASR: A state-of-the-art English ASR model, trained on an unprecedented 200K hours of human-transcribed data, achieving the SoTA WER with customizable verbatim transcription.
> Diarization: Rev fine-tuned models with pyannote, leveraging 26K hours of labeled data. Reverb v1 uses pyannote3.0, while v2 adopts WavLM over SincNet.
Rev ASR Architecture:
> Architecture: Features a powerful CTC/attention architecture with 18-conformer and 6-transformer layers, and 600M parameters. Language-specific layers control verbatim output.
> Inference: Supports various decoding modes, including CTC, attention, and joint CTC/attention decoding.
> Production ready: Optimized pipeline with WFST beam search, unigram LM, and attention rescoring. Parallel processing ensures fast turnaround with post-processing for formatted output.
Reverb Diarization Architecture:
> v1 is based on pyannote3.0 architecture, fine-tuned on Rev's data for 17 epochs, using an A100 GPU for 4 days. It has 2 LSTM layers with 2.2M parameters.
> v2: An advanced version, replaces SincNet features with WavLM, offering more precise diarization.
It's great to see market leaders like?Rev - adopting open weights strategy! Kudos!!