Looking forward to Cortex-M55 + Ethos-U55
The 50x inference speed up and 25x efficiency jump are very exciting, but what I really look forward to is how it could change ML system integration.
Because of the intense computation required, horsepower and efficiency have been the two focus points for embedded ML. Standalone accelerator is the logical solution to horsepower and efficiency, dedicated compute unit provides supremacy on both aspects, but not without trade-off though:
In product development, these trade-offs translate into system integration challenges. Worth clarifying, they are not unique to ML accelerator, but shared by multi-processor architecture. It could be argued though the challenge is made more profound because ML is too good at solving many problems. Typical solution involves dealing with multiple IDEs simultaneously, making use of DMA and interrupts to handle data logistic, so that it happens in the background, doesn’t waste clock cycles. As a long standing embedded developer myself, none of these link to cheerful memory in my mind.
When the system is a single task system, typical solution works just fine. Despite hard work, desired result can be achieved. This is exactly we see the most today, ML hardware vendors provide flash-and-go images that perform certain task, such as voice recognition and face detection. In other words, system integration of the ML functionality happens at the hardware level. It's not hard to spot this approach has obvious limitation. A feature-rich wearable system like following should not be surprising nowadays.
Imagine setting up data logistic for the 4 ML processing blocks above at?inter-processor level, also the high likelihood of these tasks are not synchronized, building a robust yet scalable system becomes very difficult. As ML getting very good at solving more and more problems, the system integration challenge get amplified.
领英推荐
What’s unique to Cortex-M55+Ethos-U55 that could change the game is its unified toolchain, which to my understanding, it means developers no longer need to separate ML code at processor level, instead calling ML processing could be like a normal function call, and this is significant.
Instructions for different processor units are compiled and decoded together means these processor units are integrated at ALU level, which is a layer hidden from application developer. When I saw this a few years back, it immediately strike me that it could well solve the system integration challenge. Having good horsepower and efficiency certainly increase the possibility of applying ML in product. Smooth system integration on the other hand makes realizing this possibility easier and quicker.
As of the moment of writing this post, Cortex-M55+Ethos-U55 is getting really close to wide availability, I have got dev kit from major vendor on pre-order. Looking forward to it!