Day 94 – Multi-Speaker Speech Separation and Recognition Using SpeechBrain
Gopi Chandrakesan
?? Project Manager/Solution Architect ?? Blogger on SAP, ?? Artificial Intelligence, ?? Machine Learning, and ?? Deep Learning ?? Ask me about Data Intelligence
We saw a post in the previous blog about SpeechBrain, Features, PreTrained models, and Speech Recognition On Different Languages By SpeechBrain.
Today, we are going to see in detail about Multi-Speaker Separation and Recognition.
What is Multi-Speaker Separation and Recognition?
When you were listening to audio and found that there were many people talking on the audio.?However, you want to hear audio from a particular person.?This feature requires high-end software or need to work with sound engineers or audio professionals to extract only the voice which you want. The emergence of Artificialy Intelligence brings this task very easy in just 13 lines of code and produce multi-speaker separation.
Let’s get into a code to check simple Multi-Speaker Separation and Recognition.
I have used SpeechBrain Pretrained models and audio files and downloaded mixed audio files (Audacity) from Azure Github.
To check my full code in Google Colab as well as here.
#Install Torchaudio, SpeechBrain and Transformers
!pip install torchaudio==0.8.1 #Temporary (until pytorch 0.9 is supported in Colab)
!pip install speechbrain
!pip install transformers'
#Import all libraries
import speechbrain as sb
from speechbrain.dataio.dataio import read_audio
from IPython.display import Audio
#Download pretrained SepformerSeparation from SpeechBrain
from speechbrain.pretrained import SepformerSeparation as separator
model = separator.from_hparams(source="speechbrain/sepformer-wsj02mix", savedir='pretrained_models/sepformer-wsj02mix')
est_sources = model.separate_file(path='speechbrain/sepformer-wsj02mix/test_mixture.wav')