from minimize error to raise quality
In this post, I am going to share the finding (and audio samples) of applying perceptual quality as training target for causal model.
By employing Reinforcement Learning techniques, non-differentiable metric, such as speech perceptual quality score PESQ or STOI, can be used as model training target. However in the examples I found, they all use non-causal model, which is not suitable for realtime application (more background in “peek into the future”). As a keen embedded ML developer, I am curious about how much improvement it could bring to causal model.
Two models will be trained using the same dataset, identical model architecture (RNN based, causal, 4ms algorithmic delay) and size (300k parameters), the only difference is the target definition and corresponding training method.
Training targets:
Two sets of result will be presented now. Test clips are unseen by the models during training.
First, both models have done a great job by the look of the waveform, STOI scores are close too. Holding my breath and change to spectrum view to inspect further.
Q target output has done a visible better job in removing the background noise, extra 6dB at the peak in fact as the spectrograms show, not bad at all!
Some audio samples are available here for your listen.
My takeaways from this experiment: