登录查看更多内容

from minimize error to raise quality

Weiming Li

Machine Learning Signal Processing | MLSP.ai

发布日期: 2025年2月18日

In this post, I am going to share the finding (and audio samples) of applying perceptual quality as training target for causal model.

By employing Reinforcement Learning techniques, non-differentiable metric, such as speech perceptual quality score PESQ or STOI, can be used as model training target. However in the examples I found, they all use non-causal model, which is not suitable for realtime application (more background in “peek into the future”). As a keen embedded ML developer, I am curious about how much improvement it could bring to causal model.

Two models will be trained using the same dataset, identical model architecture (RNN based, causal, 4ms algorithmic delay) and size (300k parameters), the only difference is the target definition and corresponding training method.

Training targets:

Minimize error when compared with ideal clean speech. With synthesized dataset, clean speech can be conveniently used as ideal answer. A popular metric, SI-SDR is used.
Not so focus on errors, instead we want the output to be pleasant to our ears. Speech perceptual quality score is used as metric and the training target is to achieve as high score as possible. Let's call this Q target in the rest of article.

Two sets of result will be presented now. Test clips are unseen by the models during training.

First, both models have done a great job by the look of the waveform, STOI scores are close too. Holding my breath and change to spectrum view to inspect further.

Q target output has done a visible better job in removing the background noise, extra 6dB at the peak in fact as the spectrograms show, not bad at all!

Some audio samples are available here for your listen.

My takeaways from this experiment:

Train towards perceptual quality can definitely improve model’s understanding about speech, resulting in more precise noise removal. This part is very encouraging.
No noticeable quality score improvement is kind of expected. Meaningful improvement likely require more of signal conditioning, filling in the corrupted waveform, therefore context is crucial and causal condition limits the amount of context can be used.
The additional effort of applying Q target is not small, it might not be the best option for quick evaluation, but for production model development, it is worthwhile doing.

要查看或添加评论，请登录

Weiming Li的更多文章

free trial: integrate NN processing in MCU with 2 lines of C code

2025年3月10日

free trial: integrate NN processing in MCU with 2 lines of C code

Trying is believing. In this post, I would enable everyone to be able to try bringing my example NN processing into…
Ray Tracing for sound, the holy grail for data generation?

2025年2月25日

Ray Tracing for sound, the holy grail for data generation?

Ray Tracing (RT) should be a very familiar term in 3D gaming, but what might be less known is its application in…
Looking forward to Cortex-M55 + Ethos-U55

2025年2月10日

Looking forward to Cortex-M55 + Ethos-U55

The 50x inference speed up and 25x efficiency jump are very exciting, but what I really look forward to is how it could…
SVDF, just give Conv a bit of time

2025年1月19日

SVDF, just give Conv a bit of time

Simply add a dimension of time to standard Conv layer, it becomes the SVDF layer, the core component powering our…
Peek into the future

2025年1月13日

Peek into the future

The Devil is in the details, a often hidden small detail that we must not miss when interpreting performance figures…
Tiny model for tiny system

2025年1月6日

Tiny model for tiny system

Large model shows us the limitless perspective of what’s possible, but model doesn’t have to be big to do amazing…

6 条评论
build trust with black box

2024年12月29日

build trust with black box

Putting a black box in a product requires courage, a few ways to turn some of the courage into confidence. A NN model…
from batch to streaming

2024年12月19日

from batch to streaming

Unexpected complication I wish I were well aware of from the beginning. If you coming from a conventional DSP…
Fuzzy Memory

2024年12月16日

Fuzzy Memory

I don’t mean the kind we have after a hangover, but the kind powering some of the greatest models we know. “But do I…
Stochastic Rounding

2024年12月12日

Stochastic Rounding

When comes to digital signal, NN has the same liking as our ears. Rounding a number is a very common operation in DSP…

1 条评论

See all articles

Weiming Li的更多文章

free trial: integrate NN processing in MCU with 2 lines of C code

Ray Tracing for sound, the holy grail for data generation?

Looking forward to Cortex-M55 + Ethos-U55

SVDF, just give Conv a bit of time

Peek into the future

Tiny model for tiny system

build trust with black box

from batch to streaming

Fuzzy Memory

Stochastic Rounding