Reimagining Visual Sensing
In a recent four-part series of articles, I wrote about our journey in bringing neuromorphic vision out of the lab and into real life. Here, I preview our latest work, in which we fundamentally rethink the idea of neuromorphic vision to enable seamless integration into all computer vision applications.
What is Computer Vision For?
At the risk of stating the obvious, computer vision has two functions:
What may be less obvious is that both tasks are fundamentally similar. Any good computer vision system will throw away as much data as possible, while still enabling good decisions to be made, and faithful scenes to be reconstructed.
The Human Approach
Our own visual systems are very good at throwing away just the right data. The retina cuts the equivalent of gigabytes per second of incoming data into a highly compact, yet still useful, data stream of approximately one megabyte per second.
The rest of the brain then completes the process of converting the scene into compact abstract representations, containing elements such as named objects, object trajectories, object relationships, etc. Then we can do the following:
Computer Vision Today: A Split View
How does current computer vision compare with human visual processing? Right now, the landscape contains two extremes. Conventional frames are nearly "lossless" representations of the world: the pixel-level error (quantization error) is usually very small, e.g. 1 part in 1024 for 10-bit resolution. This precision enables very good reconstruction and high-quality decisions, but at the cost of high bandwidth, slower response, and higher energy usage.
Current neuromorphic event-based vision, which our founders at iniVation pioneered, uses binary events that encode one bit of information: an increase or decrease in brightness of a pixel by a certain fixed fraction. An analog circuit is used to perform this detection. These sensors are fast and low-power, but have a number of practical disadvantages:
Binary events usually save bandwidth, but the highly lossy encoding leads to low reconstruction quality, and thus lower decision quality compared with frames. Exceptions may apply in very fast-moving scenes, where the speed advantage of binary events outweighs the signal quality of frames. As a side note, using binary events directly for computation (e.g. spiking neural networks) may hold promise for improving efficiency, but does not take away the disadvantage caused by having noisy events in the first place.
What if we combine frames and binary events? Various combinations of frame plus binary event sensors have been produced by us and others, as listed below.
领英推荐
All of these methods can be useable in certain situations. However, each method has specific disadvantages, and they are not a general framework for solving computer vision problems.
A Unified View - The New Aeveon Sensor
Is there a way to resolve the dichotomy between frames and events? This is what we set out to do when developing our new Aeveon sensor. With Aeveon, there is no fundamental difference between a frame and an event. Everything is an event, of which there are four main types:
All events are encoded losslessly, except for the single-bit events. This makes it possible to reconstruct a clean frame at any time, using simple computations (no neural network needed). The area events group similar events, enabling high compression under many circumstances.
Furthermore, it is possible to define any region of interest (ROI), where the sensor is working differently in every region. This flexibility is a kind of "attentional mechanism" that allows the user to focus the data stream on what is most interesting, while still detecting motion in the surrounding area.
To achieve this functionality, the sensor data is processed across a massively parallel array of what we call Adaptive Event Cores. It uses a so-called stacked sensor design, in which a pixel array chip is bonded on top of a digital processing chip. The architecture can use different pixel designs (standard RGB, infrared, etc.), every pixel can output different types of events (and frames). The resolution can scale up to the same resolutions seen in state-of-the-art smartphones.
The overall result is a sensor that is both very fast and highly flexible, optimizing its bandwidth usage either automatically or under user control. This flexibility brings the advantages of low-bandwidth, high-speed vision to every application, including:
Because the sensor can work in frame mode, it can directly replace existing cameras. This flexibility preserves existing investments in software, and provides a simple upgrade path to exploit its event-based features.
In summary, with Aeveon we have created a unified view of neuromorphic vision, encompassing pixel-level frames, pixel events, and higher-level area events. This approach will enable general solutions to computer vision problems that, until now, have required ad-hoc combinations of methods and technologies.
Aeveon will be available as a preview to selected customers later this year. Contact us if you would like to learn more!
Die MaKe360°-Unternehmensanalyse - let's do the kick off!
1 年Kynan, thanks for sharing!
??International Startup Mentor & Coach ??Agile Business Transformation Strategist ??Sustainability Projects ??AI Supported E-Learning Solutions
1 年Thanks for sharing, Kynan :)
My goal is to venture with scientists and builders to create escape velocity for everyone
1 年Cool - Looked at SpikeCV datasets? https://openi.pcl.ac.cn/Cordium/SpikeCV
Engineer. Innovator. Evangelist. Leader. Investor
1 年Interesting