TinyML vision is turning into reality with microNPUs (μNPUs)
Elad Baram
Dynamic Product Management Leader | IoT & AI Innovator | Technology Strategy
Computer vision technology today is at an inflection point, with major trends converging to enable this technology – which has to date been mainly in the cloud – to become ubiquitous in tiny edge AI devices.
In this article, we will review the technology advancements that are enabling this cloud-centric AI technology to extend to the edge and talk about what still needs to be done to make AI vision at the edge truly pervasive.
The three major technological trends enabling this evolution are:
1.??????Development of neural network algorithms for tiny devices
2.??????Silicon design in the μNPU era: new architectures like Arm’s Ethos targeting machine learning (ML) workloads with two orders of magnitude more efficiency for neural network processing than CPUs
3.??????AI frameworks reaching the tiny edge: development tools for ML on microprocessors reaching maturity, reducing barriers to widespread adoption by the developer community
As all these elements come together, tiny processors at the milliwatt scale now have powerful neural processing units that can execute extremely efficient CNNs leveraging a mature and easy-to-use development toolchain. This will enable exciting new use cases across just about every part of our lives.
The promise of computer vision at the edge
Computer vision technology has become ubiquitous in many fields over the last two decades. Digital image processing – as it used to be called – is used for applications ranging from semiconductor manufacturing and inspection to advanced driver assistance system (ADAS) features such as lane departure warning and blind-spot detection to image beautification and manipulation on mobile devices. Looking ahead, computer vision technology at the edge is enabling the next level of human-machine interfaces (HMIs).
HMIs have evolved significantly in the last decade. On top of “traditional” interfaces like the keyboard and mouse, we have now touch displays, fingerprint readers, face recognition systems, and voice command capabilities. While clearly improving the user experience, these methods have one other attribute in common – they all react to user actions. The next level of HMI will be devices that understand users and their environment via contextual awareness.
Context-aware devices understand users better and automate interactions. A laptop “knows” when a user is attentive and can adapt its behavior and power policy accordingly. WiseEye visual sensors from Emza Visual Sense enable OEMs to optimize power by adaptively dimming the display when a user is not attentive to the display.
A smart TV set knows if someone is watching and from where, then adapts the image quality and sound accordingly. It can then automatically turn off to save power when no one is there.
Air conditioning systems optimize power and airflow according to room occupancy to save on energy costs. This and other examples of smart space utilization in buildings are becoming even more financially important with hybrid home-office work models.
There are also endless use cases for visual sensing in industrial fields. From object detection to enforce safety regulation (i.e., restricted zones, safe passages, protective gear enforcement…) up to anomaly detection for manufacturing process control. In agritech, crop inspections, status and quality monitoring enabled by computer vision technologies are all critical.
Whether it’s in laptops, consumer electronics, smart building sensors or industrial environments, this ‘ambient computing’ capability is enabled when tiny and affordable microprocessors, tiny neural networks, and optimized AI frameworks make devices more intelligent and power efficient.??
Neural networks get smaller
One of the fundamental tasks in computer vision is object detection. Object detection in essence requires two tasks: 1) localization – determining where an object is located within the image; and 2) classification – identifying the detected object. These two tasks have been the subject of research and development for many years.
2012 marked the turning point when computer vision started to shift from “classical” CV methods to Deep Convolutional Neural Networks (DCNN), with the publication of AlexNet by Alex Krizhevsky and his colleagues. There was no turning back after the DCNN won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) that year.
Since then, teams across the globe have continued research and development targeting higher detection performance, but without much concern about the efficiency of the underlying hardware. CNNs continued to be data-and compute hungry; this focus on performance was fine for applications running in the cloud infrastructure.
In 2015, ResNet152 was introduced. It had 60 million parameters, required more than 11 gigaflops (11,636,600,000) for a single inference operation and demonstrated 94% top-5 accuracy for the ImageNet data set. This continued to push the performance and accuracy of CNNs, but it wasn’t until 2017 with the publication of MobileNets by a group of researchers from Google that we saw a push toward efficiency. MobileNets were significantly lighter than existing NN architectures at that time, with models targeted for execution on mobile phones. MobileNetV2 as an example had 3.5 million parameters and required 336M flops. Roughly 20x “lighter” than ResNet192, both in memory and computational load, MobileNetV2 demonstrated top-5 accuracy of 90%. A new set of mobile-friendly applications could now use AI.?
Hardware evolves
With smaller NNs in sight and a clear understanding of the workloads involved, developers could now design optimized silicon for tiny AI. The key element that differentiates an AI-capable microcontroller from a legacy MCU is the micro NPU (neural processing unit). ?These dedicated cores can execute neural network inference 10x or 100x faster than a CPU.
In 2020, Arm introduced the next-generation AI for IoT with its Arm Cortex-M55 and Ethos U55 micro NPU. These cores are available for development on the cloud by using the virtual platform (FVP).
As an Arm AI eco-system partner, Emza implemented a face detection model on an Ethos U55 μNPU, training an object detection model that is a lightweight version of SSD (Single Shot Detector) The results are astonishing: the model execution time on an Ethos U55 μNPU is less than 5 milliseconds! This is comparable to running the same model with the horsepower of a mobile phone processor like the Snapdragon 845. ?When executing this same model on the Raspberry Pi 3B (4x Cortex A53), execution time is 6x longer compared with Ethos U55.
Github link: https://github.com/emza-vs/emza_yaw_landmarks_alif
The complex tasks that previously required expensive hardware can today be done on tiny edge cores. We are witnessing a decade’s worth of advancements happening in a very compressed time scale.
AI frameworks & democratization
A critical element of widespread adoption of any technology adoption is the availability of development tools. TensorFlow Lite for Microcontrollers (TFLM) by Google is a leading framework designed to enable easier training and deployment of AI on the tiny edge. The PyTorch Mobile framework and Glow compiler from Meta are also targeting this area. In addition, there are today quite a few AI automation platforms (known as AutoML) that can automate some aspects of AI deployment. Examples of those are Edge Impulse, Deeplite, Qeexo, SensiML and more.
To enable execution on specific hardware and μNPUs. Compilers and toolchains must be modified.
Arm has developed the Vela compiler that optimizes the CNN model execution for Ethos U55 μNPU. The Vela compiler removes the complexities of a system that contains both a Cortex CPU and an Ethos μNPU by automatically splitting the AI model execution task between them.
An important development in this area is Apache TVM, an open-source, end-to-end ML compiler framework for CPUs, GPUs, NPUs and accelerators. More specifically, TVM micro is targeting microcontrollers with the vision of running any AI model on any hardware.
The evolution of AI frameworks, AutoML platforms and compilers make it easier for developers to leverage the new μNPUs for their specific needs.
Ubiquitous AI at the edge
As forces continue to evolve, the trend toward ubiquitous edge AI is clear. Hardware costs are decreasing, computation capability is increasing significantly, and new methodologies make it easier to train and deploy models. All of this is leading to fewer barriers to adoption, and increased use of computer vision AI at the edge.
But even as we see increasingly ubiquitous tiny edge AI, there is still work to do. To make ambient computing a reality, we need to overcome the long tail of use cases in many segments that can create a scalability challenge. Consumer products, factories, agriculture, retail and other segments each require different algorithms and unique data sets for training.?The R&D investments and skillset needed to solve each use case continue to be a major barrier today.
This gap is filled today by AI companies that specialize in computer vision and can provide total solutions for specific vertical use cases.?An expert solution provider can develop the right algorithms optimized to the target hardware to solve specific business needs within cost, size and power constraints. With a solution provider closing the last mile, we can turn the vision into reality.?
References
1.??????Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. NIPS’12 Proc 25th Int Conf Neural Inf Process Syst 1:1097–1105
2.??????MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications https://arxiv.org/pdf/1704.04861.pdf
AI Enablement @NXP Semiconductors
2 年Great post Elad - this convergence is bringing us closer to the "1 TOP 1 mw 1 dollar" AI inference that will make #tinyML even more accessible!!