登录查看更多内容

Peek into the future

Weiming Li

Machine Learning Signal Processing | MLSP.ai

发布日期: 2025年1月13日

The Devil is in the details, a often hidden small detail that we must not miss when interpreting performance figures.

Beneath is a plot (from https://paperswithcode.com/sota/speech-separation-on-wsj0-2mix) showing the advancement of speech separation models over the years.

It’s a pretty encouraging picture, before considering employing any of them for your application, there is one detail cannot be missed, it’s whether the model is causal.

“Causal” here means does NOT use future information to make decision at current time step. Let’s use Conv-Tasnet (arXiv:1809.07454v3) as example since it has a very good comparison table.

from paper "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation" (arXiv:1809.07454v3)

Same architecture, simply change the model to be causal, both LSTM-Tasnet and Conv-Tasnet have a noticeable performance drop and seems to hit some kind of ceiling.

Non-causal models have access to wider context, both past and future (relative to current time step), hence the potential of achieving better result, especially with transformer architecture. This suits non-realtime applications or ones that can afford significant latency. Realtime processing can only make use of the past, therefore a lower performance ceiling is expected.

Edge AI applications often have realtime or sharp responsiveness as requirement, hence model being causal is essential. Although with a lower ceiling, the trend is we get there with less and less computation. The earlier Conv-Tasnet example demonstrates that too, significantly reduce the number of parameters while reaching same performance level.

Not all designs are realtime friendly or include causal variant as part of verification, in those cases extra evaluation is needed to gain further understanding.

Worth highlighting here, this blog is Edge AI focused, hence all examples by default are realtime oriented implementation.

要查看或添加评论，请登录

Weiming Li的更多文章

free trial: integrate NN processing in MCU with 2 lines of C code

2025年3月10日

free trial: integrate NN processing in MCU with 2 lines of C code

Trying is believing. In this post, I would enable everyone to be able to try bringing my example NN processing into…
Ray Tracing for sound, the holy grail for data generation?

2025年2月25日

Ray Tracing for sound, the holy grail for data generation?

Ray Tracing (RT) should be a very familiar term in 3D gaming, but what might be less known is its application in…
from minimize error to raise quality

2025年2月18日

from minimize error to raise quality

In this post, I am going to share the finding (and audio samples) of applying perceptual quality as training target for…
Looking forward to Cortex-M55 + Ethos-U55

2025年2月10日

Looking forward to Cortex-M55 + Ethos-U55

The 50x inference speed up and 25x efficiency jump are very exciting, but what I really look forward to is how it could…
SVDF, just give Conv a bit of time

2025年1月19日

SVDF, just give Conv a bit of time

Simply add a dimension of time to standard Conv layer, it becomes the SVDF layer, the core component powering our…
Tiny model for tiny system

2025年1月6日

Tiny model for tiny system

Large model shows us the limitless perspective of what’s possible, but model doesn’t have to be big to do amazing…

6 条评论
build trust with black box

2024年12月29日

build trust with black box

Putting a black box in a product requires courage, a few ways to turn some of the courage into confidence. A NN model…
from batch to streaming

2024年12月19日

from batch to streaming

Unexpected complication I wish I were well aware of from the beginning. If you coming from a conventional DSP…
Fuzzy Memory

2024年12月16日

Fuzzy Memory

I don’t mean the kind we have after a hangover, but the kind powering some of the greatest models we know. “But do I…
Stochastic Rounding

2024年12月12日

Stochastic Rounding

When comes to digital signal, NN has the same liking as our ears. Rounding a number is a very common operation in DSP…

1 条评论

See all articles

Peek into the future

Weiming Li

Machine Learning Signal Processing | MLSP.ai

Weiming Li的更多文章

社区洞察

其他会员也浏览了

The Power of Process: Less what we choose than how we choose it

Alternate Maths Can Slash LLM Costs by 95%

Thoughts on Future Tech Impacts

The Latest from Latent AI

Machine learning for IRB models

Lamborghini RAG

Understanding Vector Norms: A Comprehensive Guide to L1, L2, L∞, and Beyond...

How long before the Matrix is possible?

The Power of 7 Sensors

Weiming Li的更多文章

free trial: integrate NN processing in MCU with 2 lines of C code

Ray Tracing for sound, the holy grail for data generation?

from minimize error to raise quality

Looking forward to Cortex-M55 + Ethos-U55

SVDF, just give Conv a bit of time

Tiny model for tiny system

build trust with black box

from batch to streaming

Fuzzy Memory

Stochastic Rounding

社区洞察

其他会员也浏览了

The Power of Process: Less what we choose than how we choose it

Alternate Maths Can Slash LLM Costs by 95%

Thoughts on Future Tech Impacts

The Latest from Latent AI

Machine learning for IRB models

Lamborghini RAG

Understanding Vector Norms: A Comprehensive Guide to L1, L2, L∞, and Beyond...

How long before the Matrix is possible?

The Power of 7 Sensors