登录查看更多内容

Reimagining Visual Sensing

Kynan Eng

High-performance neuromorphic sensing and processing

发布日期: 2023年3月12日

In a recent four-part series of articles, I wrote about our journey in bringing neuromorphic vision out of the lab and into real life. Here, I preview our latest work, in which we fundamentally rethink the idea of neuromorphic vision to enable seamless integration into all computer vision applications.

What is Computer Vision For?

At the risk of stating the obvious, computer vision has two functions:

To enable machines to make decisions
To enable a scene to be reconstructed for viewing by humans

What may be less obvious is that both tasks are fundamentally similar. Any good computer vision system will throw away as much data as possible, while still enabling good decisions to be made, and faithful scenes to be reconstructed.

The Human Approach

Our own visual systems are very good at throwing away just the right data. The retina cuts the equivalent of gigabytes per second of incoming data into a highly compact, yet still useful, data stream of approximately one megabyte per second.

No alt text provided for this image — Data reduction in the retina

The rest of the brain then completes the process of converting the scene into compact abstract representations, containing elements such as named objects, object trajectories, object relationships, etc. Then we can do the following:

Make decisions using these abstracted representations.
Reconstruct scenes in our minds using these representations, i.e. use our visual imagination, and then make further decisions if desired.

Computer Vision Today: A Split View

How does current computer vision compare with human visual processing? Right now, the landscape contains two extremes. Conventional frames are nearly "lossless" representations of the world: the pixel-level error (quantization error) is usually very small, e.g. 1 part in 1024 for 10-bit resolution. This precision enables very good reconstruction and high-quality decisions, but at the cost of high bandwidth, slower response, and higher energy usage.

Current neuromorphic event-based vision, which our founders at iniVation pioneered, uses binary events that encode one bit of information: an increase or decrease in brightness of a pixel by a certain fixed fraction. An analog circuit is used to perform this detection. These sensors are fast and low-power, but have a number of practical disadvantages:

The analog circuits are noisy, and have high inter-pixel variation (mismatch).
The circuit cannot be shrunk as easily as in digital circuits (Moore's law does not apply easily in analog at small scales), limiting the possible pixel size reduction and also the power saving potential.

Binary events usually save bandwidth, but the highly lossy encoding leads to low reconstruction quality, and thus lower decision quality compared with frames. Exceptions may apply in very fast-moving scenes, where the speed advantage of binary events outweighs the signal quality of frames. As a side note, using binary events directly for computation (e.g. spiking neural networks) may hold promise for improving efficiency, but does not take away the disadvantage caused by having noisy events in the first place.

What if we combine frames and binary events? Various combinations of frame plus binary event sensors have been produced by us and others, as listed below.

领英推荐

Understanding And Preventing Aberrations In Machine…

Naveen Joshi 3 年前

How Diffusion Models Paint with Pixels

Aditya Katira 1 个月前

5 Common Problems with Computer Vision and their…

Chooch 1 年前

Dual-sensor: Event sensor plus separate frame sensor (e.g. our DVXplorer S Duo). Conceptually simple, but it costs more (two chips), and it has alignment issues related to using two cameras with different lenses and resolutions.
Hybrid single event+frame sensor: A single sensor, with a mix of event pixels and frame (RGB) pixels. In practice, the event pixels are much larger than the frame pixels. This leads to resolution mismatch and "holes" in the frame pixel array, causing events to be missed and/or problems with interpolation over the holes.
Single-sensor, dual event+frame readout: Our DAVIS346 sensor is an example of this type. The light is collected in a single photodiode, then two separate circuits read out either a frame or an event. This method overcomes the mismatch issues of hybrid sensors, but retains the noise issues of the analog event circuit.

All of these methods can be useable in certain situations. However, each method has specific disadvantages, and they are not a general framework for solving computer vision problems.

A Unified View - The New Aeveon Sensor

Is there a way to resolve the dichotomy between frames and events? This is what we set out to do when developing our new Aeveon sensor. With Aeveon, there is no fundamental difference between a frame and an event. Everything is an event, of which there are four main types:

Full pixel value: similar to a frame
Multi-bit event: an incremental difference at pixel level
Single-bit event: equivalent to the legacy binary events
Area event: a group of similar events in a region of the sensor

All events are encoded losslessly, except for the single-bit events. This makes it possible to reconstruct a clean frame at any time, using simple computations (no neural network needed). The area events group similar events, enabling high compression under many circumstances.

Furthermore, it is possible to define any region of interest (ROI), where the sensor is working differently in every region. This flexibility is a kind of "attentional mechanism" that allows the user to focus the data stream on what is most interesting, while still detecting motion in the surrounding area.

To achieve this functionality, the sensor data is processed across a massively parallel array of what we call Adaptive Event Cores. It uses a so-called stacked sensor design, in which a pixel array chip is bonded on top of a digital processing chip. The architecture can use different pixel designs (standard RGB, infrared, etc.), every pixel can output different types of events (and frames). The resolution can scale up to the same resolutions seen in state-of-the-art smartphones.

The overall result is a sensor that is both very fast and highly flexible, optimizing its bandwidth usage either automatically or under user control. This flexibility brings the advantages of low-bandwidth, high-speed vision to every application, including:

Automotive: Low-latency HDR navigation (anti-flicker included), in-cabin observation
AR/VR and XR: Eye tracking, scene camera, visual positioning, hand tracking, etc.
Robotics: Navigation, object recognition and tracking, etc.
Mobile imaging: image stabilization, anti-blur, high-speed recording - in real-time, without requiring large neural network models

Because the sensor can work in frame mode, it can directly replace existing cameras. This flexibility preserves existing investments in software, and provides a simple upgrade path to exploit its event-based features.

In summary, with Aeveon we have created a unified view of neuromorphic vision, encompassing pixel-level frames, pixel events, and higher-level area events. This approach will enable general solutions to computer vision problems that, until now, have required ad-hoc combinations of methods and technologies.

Aeveon will be available as a preview to selected customers later this year. Contact us if you would like to learn more!

Matthias Kessler

Die MaKe360°-Unternehmensanalyse - let's do the kick off!

1 年

Kynan, thanks for sharing!

Kristel Piibur

??International Startup Mentor & Coach ??Agile Business Transformation Strategist ??Sustainability Projects ??AI Supported E-Learning Solutions

1 年

Thanks for sharing, Kynan :)

Nader Benmessaoud [天使 E/ACC]

My goal is to venture with scientists and builders to create escape velocity for everyone

2 年

Cool - Looked at SpikeCV datasets? https://openi.pcl.ac.cn/Cordium/SpikeCV

David Wyatt

Engineer. Innovator. Evangelist. Leader. Investor

2 年

Interesting

查看更多评论

要查看或添加评论，请登录

Kynan Eng的更多文章

The Birth of Neuromorphic Vision: Part 4 - The Future (Section II)

2023年1月5日

The Birth of Neuromorphic Vision: Part 4 - The Future (Section II)

This is the fourth part of a series about neuromorphic vision. Part 1 (Origins) was about the work of some early…

3 条评论
The Birth of Neuromorphic Vision: Part 3 - The Future (Section I)

2022年12月30日

The Birth of Neuromorphic Vision: Part 3 - The Future (Section I)

This is the third part of a series about neuromorphic vision. In Part 1 (Origins) I outlined the work of some early…

3 条评论
The Birth of Neuromorphic Vision: Part 2 - Applications

2022年12月27日

The Birth of Neuromorphic Vision: Part 2 - Applications

This is the second of a four-part series of short articles about the origins, applications, and future of neuromorphic…

2 条评论
The Birth of Neuromorphic Vision: Part 1 - Origins

2022年12月22日

The Birth of Neuromorphic Vision: Part 1 - Origins

This is the first of a four-part series of short articles on the origins, applications, and future of neuromorphic…

6 条评论
On Making an Impact

2017年5月16日

On Making an Impact

Recently, I went to the book store with my wife and kids. While browsing in the children's section, I found a picture…

3 条评论
What Matt Damon taught me (unintentionally) about viral social media and neuroscience

2016年1月8日

What Matt Damon taught me (unintentionally) about viral social media and neuroscience

I am not a digital native. How do digital natives simultaneously promote their work, abilities, lifestyle and beliefs…

3 条评论

See all articles

Reimagining Visual Sensing

Kynan Eng

High-performance neuromorphic sensing and processing

What is Computer Vision For?

The Human Approach

Computer Vision Today: A Split View

领英推荐

A Unified View - The New Aeveon Sensor

Kynan Eng的更多文章

社区洞察

其他会员也浏览了

AI Weekly: Stability AI's advanced text-to-image model, new Sora competitor called Dream Machine, AI avatar running for UK parliament – 06/19/2024

Has 3D Gaussian Splatting Broken into the Mainstream?

Plasma Dekatron

Canon's Solutions For Tomorrow

The Scale-Invariant Feature Transform (SIFT)

#E1I49: Sun-sational?AI Secrets

AI and the Future of Photography: A Technical Deep Dive

Imagen- Out of the Experimental Test Kitchen And Into the Limelight

Change in Computer Vision Technologies Begins!

The Palette of Pixels: A Chronicle of AI in Art

What is Computer Vision For?

The Human Approach

Computer Vision Today: A Split View

领英推荐

A Unified View - The New Aeveon Sensor

Kynan Eng的更多文章

The Birth of Neuromorphic Vision: Part 4 - The Future (Section II)

The Birth of Neuromorphic Vision: Part 3 - The Future (Section I)

The Birth of Neuromorphic Vision: Part 2 - Applications

The Birth of Neuromorphic Vision: Part 1 - Origins

On Making an Impact

What Matt Damon taught me (unintentionally) about viral social media and neuroscience

社区洞察

其他会员也浏览了

AI Weekly: Stability AI's advanced text-to-image model, new Sora competitor called Dream Machine, AI avatar running for UK parliament – 06/19/2024

Has 3D Gaussian Splatting Broken into the Mainstream?

Plasma Dekatron

Canon's Solutions For Tomorrow

The Scale-Invariant Feature Transform (SIFT)

#E1I49: Sun-sational?AI Secrets

AI and the Future of Photography: A Technical Deep Dive

Imagen- Out of the Experimental Test Kitchen And Into the Limelight

Change in Computer Vision Technologies Begins!

The Palette of Pixels: A Chronicle of AI in Art