Reimagining Visual Sensing

Reimagining Visual Sensing

In a recent four-part series of articles, I wrote about our journey in bringing neuromorphic vision out of the lab and into real life. Here, I preview our latest work, in which we fundamentally rethink the idea of neuromorphic vision to enable seamless integration into all computer vision applications.

What is Computer Vision For?

At the risk of stating the obvious, computer vision has two functions:

  1. To enable machines to make decisions
  2. To enable a scene to be reconstructed for viewing by humans

What may be less obvious is that both tasks are fundamentally similar. Any good computer vision system will throw away as much data as possible, while still enabling good decisions to be made, and faithful scenes to be reconstructed.

The Human Approach

Our own visual systems are very good at throwing away just the right data. The retina cuts the equivalent of gigabytes per second of incoming data into a highly compact, yet still useful, data stream of approximately one megabyte per second.

No alt text provided for this image
Data reduction in the retina

The rest of the brain then completes the process of converting the scene into compact abstract representations, containing elements such as named objects, object trajectories, object relationships, etc. Then we can do the following:

  1. Make decisions using these abstracted representations.
  2. Reconstruct scenes in our minds using these representations, i.e. use our visual imagination, and then make further decisions if desired.

Computer Vision Today: A Split View

How does current computer vision compare with human visual processing? Right now, the landscape contains two extremes. Conventional frames are nearly "lossless" representations of the world: the pixel-level error (quantization error) is usually very small, e.g. 1 part in 1024 for 10-bit resolution. This precision enables very good reconstruction and high-quality decisions, but at the cost of high bandwidth, slower response, and higher energy usage.

Current neuromorphic event-based vision, which our founders at iniVation pioneered, uses binary events that encode one bit of information: an increase or decrease in brightness of a pixel by a certain fixed fraction. An analog circuit is used to perform this detection. These sensors are fast and low-power, but have a number of practical disadvantages:

  • The analog circuits are noisy, and have high inter-pixel variation (mismatch).
  • The circuit cannot be shrunk as easily as in digital circuits (Moore's law does not apply easily in analog at small scales), limiting the possible pixel size reduction and also the power saving potential.

Binary events usually save bandwidth, but the highly lossy encoding leads to low reconstruction quality, and thus lower decision quality compared with frames. Exceptions may apply in very fast-moving scenes, where the speed advantage of binary events outweighs the signal quality of frames. As a side note, using binary events directly for computation (e.g. spiking neural networks) may hold promise for improving efficiency, but does not take away the disadvantage caused by having noisy events in the first place.

No alt text provided for this image
Comparison of frames vs binary events for scene reconstruction and decision making

What if we combine frames and binary events? Various combinations of frame plus binary event sensors have been produced by us and others, as listed below.

No alt text provided for this image
Overview of frame (RGB), event (DVS) and mixed image sensor types

  • Dual-sensor: Event sensor plus separate frame sensor (e.g. our DVXplorer S Duo). Conceptually simple, but it costs more (two chips), and it has alignment issues related to using two cameras with different lenses and resolutions.
  • Hybrid single event+frame sensor: A single sensor, with a mix of event pixels and frame (RGB) pixels. In practice, the event pixels are much larger than the frame pixels. This leads to resolution mismatch and "holes" in the frame pixel array, causing events to be missed and/or problems with interpolation over the holes.
  • Single-sensor, dual event+frame readout: Our DAVIS346 sensor is an example of this type. The light is collected in a single photodiode, then two separate circuits read out either a frame or an event. This method overcomes the mismatch issues of hybrid sensors, but retains the noise issues of the analog event circuit.

All of these methods can be useable in certain situations. However, each method has specific disadvantages, and they are not a general framework for solving computer vision problems.

A Unified View - The New Aeveon Sensor

Is there a way to resolve the dichotomy between frames and events? This is what we set out to do when developing our new Aeveon sensor. With Aeveon, there is no fundamental difference between a frame and an event. Everything is an event, of which there are four main types:

  • Full pixel value: similar to a frame
  • Multi-bit event: an incremental difference at pixel level
  • Single-bit event: equivalent to the legacy binary events
  • Area event: a group of similar events in a region of the sensor

All events are encoded losslessly, except for the single-bit events. This makes it possible to reconstruct a clean frame at any time, using simple computations (no neural network needed). The area events group similar events, enabling high compression under many circumstances.

No alt text provided for this image
Aeveon sensor showing difference between legacy binary events and multi-bit events

Furthermore, it is possible to define any region of interest (ROI), where the sensor is working differently in every region. This flexibility is a kind of "attentional mechanism" that allows the user to focus the data stream on what is most interesting, while still detecting motion in the surrounding area.

No alt text provided for this image
Multiple regions of interest with different operating modes (ROIs)

To achieve this functionality, the sensor data is processed across a massively parallel array of what we call Adaptive Event Cores. It uses a so-called stacked sensor design, in which a pixel array chip is bonded on top of a digital processing chip. The architecture can use different pixel designs (standard RGB, infrared, etc.), every pixel can output different types of events (and frames). The resolution can scale up to the same resolutions seen in state-of-the-art smartphones.

No alt text provided for this image
Aeveon schematic overview and key features

The overall result is a sensor that is both very fast and highly flexible, optimizing its bandwidth usage either automatically or under user control. This flexibility brings the advantages of low-bandwidth, high-speed vision to every application, including:

  • Automotive: Low-latency HDR navigation (anti-flicker included), in-cabin observation
  • AR/VR and XR: Eye tracking, scene camera, visual positioning, hand tracking, etc.
  • Robotics: Navigation, object recognition and tracking, etc.
  • Mobile imaging: image stabilization, anti-blur, high-speed recording - in real-time, without requiring large neural network models

Because the sensor can work in frame mode, it can directly replace existing cameras. This flexibility preserves existing investments in software, and provides a simple upgrade path to exploit its event-based features.

In summary, with Aeveon we have created a unified view of neuromorphic vision, encompassing pixel-level frames, pixel events, and higher-level area events. This approach will enable general solutions to computer vision problems that, until now, have required ad-hoc combinations of methods and technologies.

Aeveon will be available as a preview to selected customers later this year. Contact us if you would like to learn more!

Matthias Kessler

Die MaKe360°-Unternehmensanalyse - let's do the kick off!

1 年

Kynan, thanks for sharing!

回复
Kristel Piibur

??International Startup Mentor & Coach ??Agile Business Transformation Strategist ??Sustainability Projects ??AI Supported E-Learning Solutions

1 年

Thanks for sharing, Kynan :)

回复
Nader Benmessaoud [天使 E/ACC]

My goal is to venture with scientists and builders to create escape velocity for everyone

1 年

Cool - Looked at SpikeCV datasets? https://openi.pcl.ac.cn/Cordium/SpikeCV

回复
David Wyatt

Engineer. Innovator. Evangelist. Leader. Investor

1 年

Interesting

回复

要查看或添加评论,请登录

Kynan Eng的更多文章

社区洞察

其他会员也浏览了