Looking forward to Cortex-M55 + Ethos-U55

Looking forward to Cortex-M55 + Ethos-U55

The 50x inference speed up and 25x efficiency jump are very exciting, but what I really look forward to is how it could change ML system integration.

Because of the intense computation required, horsepower and efficiency have been the two focus points for embedded ML. Standalone accelerator is the logical solution to horsepower and efficiency, dedicated compute unit provides supremacy on both aspects, but not without trade-off though:

  • Dedicated compute unit likely require a different toolchain than main processor
  • Inter-processor data flow need to be carefully managed

In product development, these trade-offs translate into system integration challenges. Worth clarifying, they are not unique to ML accelerator, but shared by multi-processor architecture. It could be argued though the challenge is made more profound because ML is too good at solving many problems. Typical solution involves dealing with multiple IDEs simultaneously, making use of DMA and interrupts to handle data logistic, so that it happens in the background, doesn’t waste clock cycles. As a long standing embedded developer myself, none of these link to cheerful memory in my mind.

When the system is a single task system, typical solution works just fine. Despite hard work, desired result can be achieved. This is exactly we see the most today, ML hardware vendors provide flash-and-go images that perform certain task, such as voice recognition and face detection. In other words, system integration of the ML functionality happens at the hardware level. It's not hard to spot this approach has obvious limitation. A feature-rich wearable system like following should not be surprising nowadays.

Imagine setting up data logistic for the 4 ML processing blocks above at?inter-processor level, also the high likelihood of these tasks are not synchronized, building a robust yet scalable system becomes very difficult. As ML getting very good at solving more and more problems, the system integration challenge get amplified.


What’s unique to Cortex-M55+Ethos-U55 that could change the game is its unified toolchain, which to my understanding, it means developers no longer need to separate ML code at processor level, instead calling ML processing could be like a normal function call, and this is significant.

from ARM community article <Get Started with Early Development on the Arm Cortex-M55 Processor>

Instructions for different processor units are compiled and decoded together means these processor units are integrated at ALU level, which is a layer hidden from application developer. When I saw this a few years back, it immediately strike me that it could well solve the system integration challenge. Having good horsepower and efficiency certainly increase the possibility of applying ML in product. Smooth system integration on the other hand makes realizing this possibility easier and quicker.


As of the moment of writing this post, Cortex-M55+Ethos-U55 is getting really close to wide availability, I have got dev kit from major vendor on pre-order. Looking forward to it!

要查看或添加评论,请登录

Weiming Li的更多文章

  • free trial: integrate NN processing in MCU with 2 lines of C code

    free trial: integrate NN processing in MCU with 2 lines of C code

    Trying is believing. In this post, I would enable everyone to be able to try bringing my example NN processing into…

  • Ray Tracing for sound, the holy grail for data generation?

    Ray Tracing for sound, the holy grail for data generation?

    Ray Tracing (RT) should be a very familiar term in 3D gaming, but what might be less known is its application in…

  • from minimize error to raise quality

    from minimize error to raise quality

    In this post, I am going to share the finding (and audio samples) of applying perceptual quality as training target for…

  • SVDF, just give Conv a bit of time

    SVDF, just give Conv a bit of time

    Simply add a dimension of time to standard Conv layer, it becomes the SVDF layer, the core component powering our…

  • Peek into the future

    Peek into the future

    The Devil is in the details, a often hidden small detail that we must not miss when interpreting performance figures…

  • Tiny model for tiny system

    Tiny model for tiny system

    Large model shows us the limitless perspective of what’s possible, but model doesn’t have to be big to do amazing…

    6 条评论
  • build trust with black box

    build trust with black box

    Putting a black box in a product requires courage, a few ways to turn some of the courage into confidence. A NN model…

  • from batch to streaming

    from batch to streaming

    Unexpected complication I wish I were well aware of from the beginning. If you coming from a conventional DSP…

  • Fuzzy Memory

    Fuzzy Memory

    I don’t mean the kind we have after a hangover, but the kind powering some of the greatest models we know. “But do I…

  • Stochastic Rounding

    Stochastic Rounding

    When comes to digital signal, NN has the same liking as our ears. Rounding a number is a very common operation in DSP…

    1 条评论

社区洞察

其他会员也浏览了