登录查看更多内容

from batch to streaming

Weiming Li

Machine Learning Signal Processing | MLSP.ai

发布日期: 2024年12月19日

+ 关注

Unexpected complication I wish I were well aware of from the beginning.

If you coming from a conventional DSP background like me, this could be particularly useful. Let’s dive in.

First, for those without the concept of mini-batch training, here is my short version. Instead of processing single unit of input data at a time, use a batch of input data and process them in parallel. This way not just provides efficient learning (less parameter update calculation), but also takes full advantage of parallel computing hardware (GPU). All makes good sense.

Here is what a batch of data often look like, very similar to a 3-dimentional array, and it’s called a Tensor.

[Batch_size, time_steps, data_frame]

Data_frame:?input unit to the model.

Time_steps:?sequence of input units line up in time elapsing order, the number of units is dependent on the length of a clip (or a sample) of training data.

Batch_size:?a batch contains a number of clips.

As example, a batch containing total of 80 seconds speech data, 10sec per clip, 128 PCM data per frame no overlap, 16000kHz sample rate, would have dimensions of [8, 1250, 128].

领英推荐

AIStor’s promptObject API, GPU Trends and the new…

MinIO 3 个月前

Advancing IBM’s pioneering platforms for AI and quantum

IBM Research 6 个月前

Deploy Any Model on Any Compute, at Any Scale!??

Clarifai 1 个月前

Many DSP applications require processing stream(s) of data, in real-time, hence frame by frame is the primary data logistic format. Expressing a single data frame in tensor format:

[1, 1, data_frame]

In straightforward cases, setting the batch_size and time_steps to 1 is exactly what need to be done when preparing a trained NN model for deployment. Simple enough.

What could make this straightforward process become chewy is data lookback. Since mini-batch training process all the input data in parallel, when a model requires the previous frame as part of the input for instance , this is what usually happens.

Simply include the data frames from earlier time steps would do just fine in mini-batch training and there are configurations to achieve that conveniently. But what will happen in deployment then, when the input has only single time step, [1,1,data_frame]? This simple way of including earlier time steps is not applicable, at least not straightforwardly.

Adding buffer (to deployment model) before every layer that requires data lookback is a logical solution. More advanced solutions are employed to suite different architecture and scalability. The intricacy is mostly around training efficiency and having consistent code between training and deployment.

The twist I wish I learned earlier is, in conventional algo development, we strive to have consistent implementation between simulation and deployment. In NN world, due to different data handling, it’s common that the trained model is not exactly the same as deployment one. Fully consider this complication from the beginning of model development would make life a lot easier during later debugging and deployment process.

要查看或添加评论，请登录

Weiming Li的更多文章

free trial: integrate NN processing in MCU with 2 lines of C code

2025年3月10日

free trial: integrate NN processing in MCU with 2 lines of C code

Trying is believing. In this post, I would enable everyone to be able to try bringing my example NN processing into…
Ray Tracing for sound, the holy grail for data generation?

2025年2月25日

Ray Tracing for sound, the holy grail for data generation?

Ray Tracing (RT) should be a very familiar term in 3D gaming, but what might be less known is its application in…
from minimize error to raise quality

2025年2月18日

from minimize error to raise quality

In this post, I am going to share the finding (and audio samples) of applying perceptual quality as training target for…
Looking forward to Cortex-M55 + Ethos-U55

2025年2月10日

Looking forward to Cortex-M55 + Ethos-U55

The 50x inference speed up and 25x efficiency jump are very exciting, but what I really look forward to is how it could…
SVDF, just give Conv a bit of time

2025年1月19日

SVDF, just give Conv a bit of time

Simply add a dimension of time to standard Conv layer, it becomes the SVDF layer, the core component powering our…
Peek into the future

2025年1月13日

Peek into the future

The Devil is in the details, a often hidden small detail that we must not miss when interpreting performance figures…
Tiny model for tiny system

2025年1月6日

Tiny model for tiny system

Large model shows us the limitless perspective of what’s possible, but model doesn’t have to be big to do amazing…

6 条评论
build trust with black box

2024年12月29日

build trust with black box

Putting a black box in a product requires courage, a few ways to turn some of the courage into confidence. A NN model…
Fuzzy Memory

2024年12月16日

Fuzzy Memory

I don’t mean the kind we have after a hangover, but the kind powering some of the greatest models we know. “But do I…
Stochastic Rounding

2024年12月12日

Stochastic Rounding

When comes to digital signal, NN has the same liking as our ears. Rounding a number is a very common operation in DSP…

1 条评论

See all articles

from batch to streaming

Weiming Li

Machine Learning Signal Processing | MLSP.ai

领英推荐

Weiming Li的更多文章

社区洞察

其他会员也浏览了

Memory That Serves You: Why Storage Is So Important for AI Data Center Deployments

Huawei AI Storage Ranked No. 1 for Performance in 2024 MLPERF? AI Benchmarks

?Dec. Newsletter: May your days be merry and reliable!

Meeting Diverse Requirements of AI Applications with 100G Ethernet Cards

Beyond Power and Chips – Solving Data Mobility is the New Arms Race

What is parallel processing?

Development of Edge Computing-Based Products in the AI Era - Challenges and Trends

?GENAI WEEKLY RUNDOWN 02.12.25 ?

OpenAI's $500 Billion Project Stargate

Leveraging a Brain-Inspired Hierarchical System for GNUS.ai: Revolutionizing Distributed Computing

领英推荐

Weiming Li的更多文章

free trial: integrate NN processing in MCU with 2 lines of C code

Ray Tracing for sound, the holy grail for data generation?

from minimize error to raise quality

Looking forward to Cortex-M55 + Ethos-U55

SVDF, just give Conv a bit of time

Peek into the future

Tiny model for tiny system

build trust with black box

Fuzzy Memory

Stochastic Rounding

社区洞察

其他会员也浏览了

Memory That Serves You: Why Storage Is So Important for AI Data Center Deployments

Huawei AI Storage Ranked No. 1 for Performance in 2024 MLPERF? AI Benchmarks

?Dec. Newsletter: May your days be merry and reliable!

Meeting Diverse Requirements of AI Applications with 100G Ethernet Cards

Beyond Power and Chips – Solving Data Mobility is the New Arms Race

What is parallel processing?

Development of Edge Computing-Based Products in the AI Era - Challenges and Trends

?GENAI WEEKLY RUNDOWN 02.12.25 ?

OpenAI's $500 Billion Project Stargate

Leveraging a Brain-Inspired Hierarchical System for GNUS.ai: Revolutionizing Distributed Computing