登录查看更多内容

Looking forward to Cortex-M55 + Ethos-U55

Weiming Li

Machine Learning Signal Processing | MLSP.ai

发布日期: 2025年2月10日

The 50x inference speed up and 25x efficiency jump are very exciting, but what I really look forward to is how it could change ML system integration.

Because of the intense computation required, horsepower and efficiency have been the two focus points for embedded ML. Standalone accelerator is the logical solution to horsepower and efficiency, dedicated compute unit provides supremacy on both aspects, but not without trade-off though:

Dedicated compute unit likely require a different toolchain than main processor
Inter-processor data flow need to be carefully managed

In product development, these trade-offs translate into system integration challenges. Worth clarifying, they are not unique to ML accelerator, but shared by multi-processor architecture. It could be argued though the challenge is made more profound because ML is too good at solving many problems. Typical solution involves dealing with multiple IDEs simultaneously, making use of DMA and interrupts to handle data logistic, so that it happens in the background, doesn’t waste clock cycles. As a long standing embedded developer myself, none of these link to cheerful memory in my mind.

When the system is a single task system, typical solution works just fine. Despite hard work, desired result can be achieved. This is exactly we see the most today, ML hardware vendors provide flash-and-go images that perform certain task, such as voice recognition and face detection. In other words, system integration of the ML functionality happens at the hardware level. It's not hard to spot this approach has obvious limitation. A feature-rich wearable system like following should not be surprising nowadays.

Imagine setting up data logistic for the 4 ML processing blocks above at?inter-processor level, also the high likelihood of these tasks are not synchronized, building a robust yet scalable system becomes very difficult. As ML getting very good at solving more and more problems, the system integration challenge get amplified.

领英推荐

Scalable Processors with Built in AI Accelerators

Ronald van Loon 1 年前

VAST 5.2 Brings A Plethora of Enhancements

VAST Data 4 个月前

Storage, AI, FOSS, Careers, NVIDIA (321.4.9) Friday PM

John J. McLaughlin 2 个月前

What’s unique to Cortex-M55+Ethos-U55 that could change the game is its unified toolchain, which to my understanding, it means developers no longer need to separate ML code at processor level, instead calling ML processing could be like a normal function call, and this is significant.

from ARM community article <Get Started with Early Development on the Arm Cortex-M55 Processor>

Instructions for different processor units are compiled and decoded together means these processor units are integrated at ALU level, which is a layer hidden from application developer. When I saw this a few years back, it immediately strike me that it could well solve the system integration challenge. Having good horsepower and efficiency certainly increase the possibility of applying ML in product. Smooth system integration on the other hand makes realizing this possibility easier and quicker.

As of the moment of writing this post, Cortex-M55+Ethos-U55 is getting really close to wide availability, I have got dev kit from major vendor on pre-order. Looking forward to it!

要查看或添加评论，请登录

Weiming Li的更多文章

free trial: integrate NN processing in MCU with 2 lines of C code

2025年3月10日

free trial: integrate NN processing in MCU with 2 lines of C code

Trying is believing. In this post, I would enable everyone to be able to try bringing my example NN processing into…
Ray Tracing for sound, the holy grail for data generation?

2025年2月25日

Ray Tracing for sound, the holy grail for data generation?

Ray Tracing (RT) should be a very familiar term in 3D gaming, but what might be less known is its application in…
from minimize error to raise quality

2025年2月18日

from minimize error to raise quality

In this post, I am going to share the finding (and audio samples) of applying perceptual quality as training target for…
SVDF, just give Conv a bit of time

2025年1月19日

SVDF, just give Conv a bit of time

Simply add a dimension of time to standard Conv layer, it becomes the SVDF layer, the core component powering our…
Peek into the future

2025年1月13日

Peek into the future

The Devil is in the details, a often hidden small detail that we must not miss when interpreting performance figures…
Tiny model for tiny system

2025年1月6日

Tiny model for tiny system

Large model shows us the limitless perspective of what’s possible, but model doesn’t have to be big to do amazing…

6 条评论
build trust with black box

2024年12月29日

build trust with black box

Putting a black box in a product requires courage, a few ways to turn some of the courage into confidence. A NN model…
from batch to streaming

2024年12月19日

from batch to streaming

Unexpected complication I wish I were well aware of from the beginning. If you coming from a conventional DSP…
Fuzzy Memory

2024年12月16日

Fuzzy Memory

I don’t mean the kind we have after a hangover, but the kind powering some of the greatest models we know. “But do I…
Stochastic Rounding

2024年12月12日

Stochastic Rounding

When comes to digital signal, NN has the same liking as our ears. Rounding a number is a very common operation in DSP…

1 条评论

See all articles

Looking forward to Cortex-M55 + Ethos-U55

Weiming Li

Machine Learning Signal Processing | MLSP.ai

领英推荐

Weiming Li的更多文章

社区洞察

其他会员也浏览了

HBM packaging market analysis in 2024: size, trends and characteristics

Revolutionizing Personal AI Computing: The Era of Compact Supercomputers

600+ AI Agents Exist. Here's How to Actually Use One

Intel? Xeon? 6 Processors Power Innovation with Superior Performance & Efficiency

CUDA Kernel Engineering: Bridging Performance and Efficiency in AI

A New Vision: The Innovative Fusion of Computer Vision and Edge Computing

Building Modular AI Compute Systems: The Fusion of Chiplets, Adaptive SOMs, and Photonic Silicon

The fastest computer for data analytics

RISC-V Summit 2023: Embedded Editor Report

White Paper: Wave Computing's Ability to Seamlessly Bridge Virtual and Physical Domains

领英推荐

Weiming Li的更多文章

free trial: integrate NN processing in MCU with 2 lines of C code

Ray Tracing for sound, the holy grail for data generation?

from minimize error to raise quality

SVDF, just give Conv a bit of time

Peek into the future

Tiny model for tiny system

build trust with black box

from batch to streaming

Fuzzy Memory

Stochastic Rounding

社区洞察

其他会员也浏览了

HBM packaging market analysis in 2024: size, trends and characteristics

Revolutionizing Personal AI Computing: The Era of Compact Supercomputers

600+ AI Agents Exist. Here's How to Actually Use One

Intel? Xeon? 6 Processors Power Innovation with Superior Performance & Efficiency

CUDA Kernel Engineering: Bridging Performance and Efficiency in AI

A New Vision: The Innovative Fusion of Computer Vision and Edge Computing

Building Modular AI Compute Systems: The Fusion of Chiplets, Adaptive SOMs, and Photonic Silicon

The fastest computer for data analytics

RISC-V Summit 2023: Embedded Editor Report

White Paper: Wave Computing's Ability to Seamlessly Bridge Virtual and Physical Domains