登录查看更多内容

For Droid Eyes Only

Jim Blakley

Living Edge Lab Associate Director (retired) at Carnegie Mellon University

发布日期: 2019年12月4日

This blog is a repost of my Intel Blog @ intel.com

These days, you can get a 65” 4K Ultra High Definition, High Dynamic Range smart TV at your local superstore for under $1000 and stream vast libraries of premium movies and TV shows to it for under $20/month. Video compression technologies like H.264 and H.265 have been key to realizing this extraordinary result. Since the dawn of video compression, the primary codec design goals have been maximizing viewer perceived quality while minimizing bandwidth, storage and coding costs, and delays. Starting with the CCITT H.120 standard in 1984 up to and including the still developing ITU-T/MPEG Versatile Video Coding standard, perceptual quality and compression rates have steadily improved while staying within the practical and economic limits of network and computing capabilities.

However, a projected 95% of video or image content will never be seen by human eyes. A substantial amount of video produced by surveillance and traffic cameras, robots, drones, autonomous vehicles and other sources is often discarded or archived without a single person watching. Instead, these videos are inputs to computer vision and video analytics applications. Some existing codecs components (e.g., motion estimation) have value for video analytics but computing and network resources devoted to maintaining vibrant colors and clear pictures on large screens are wasted when only a robot is watching.

At Intel Labs, we started asking two questions:

Can you use any of the capabilities intrinsic in current video codecs to improve video analytics results? -- i.e., “Compression Aware Analytics”
What if you created a video codec that was designed from first principles for video analytics? – i.e., “Analytics Aware Compression”

From these two questions arose the “Co-Adaptive Networking and Visual Analysis Systems” or CANVAS project led by Intel Labs’ Omesh Tickoo and Srinivasa Somayazulu. We showed some early CANVAS results at the 2019 Computer Vision and Pattern Recognition Conference (CVPR) in June.

The Right Tool for a Different Job

In my previous blog, “Sharing the Video Edge”, I described our work on smart city video analytics at the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS) at Carnegie Mellon University. There, we’ve built out an urban testbed to demonstrate camera to edge to cloud distributed video analytics use cases. In those use cases, the processing workflow looks something like the figure below.

In these applications, computer vision cameras typically stream an H.264 encoded video to an edge node and cloud based application that performs a pipeline of operations to produce some set of analytics results. In the above, the application is responsible for:

Detecting and recognizing license plate numbers in the field of view
Producing a track of the license plate through the field of view
Re-encoding a snippet video of the license plate moving through the video

To accomplish this, the edge node first decodes the video into a sequence of individual frame images. Those images are fed into a neural network that detects objects and sends re-encoded videos segments containing the objects to the cloud. The cloud decodes the arriving video and runs a plate recognizer and tracker to produce its results.

This approach is common and expedient because it leverages the many years spent creating high quality, efficient and standardized video codecs. However, as the number of cameras and volume of data from each camera increases, the network and computing infrastructure can be overwhelmed. The CANVAS team believes that, when the task is constrained to analytics, there are better ways to do it.

Let’s go a little deeper on the two approaches: compression aware analytics and analytics aware compression.

Empowering Tomorrow’s Droids with Today’s Codecs

In CANVAS, the Compression Aware Analytics project exploits the information already contained in the encoded camera stream to remove unnecessary processing in the subsequent stages. For example, the decode stage can be eliminated by training a plate recognizer to use the encoded bitstream directly. The object tracker can use the recognizer outputs and the motion vectors in the encoded bitstream to produce the plate tracks. Motion vectors needn’t be recomputed. The video snippet can be extracted from the encoded video using the track timestamps and, if necessary, further compressed for transmission to the data center.

Designing the Perfect Droid Codec

The other CANVAS project, Analytics Aware Compression, believes it is possible to improve compression rates and analytics performance by designing a codec that optimizes for elements of the camera stream that are important for analytics. In general, analytics applications don’t require high perceptual quality. They need high resolution images of objects of interest and good object tracking through the field of view. Analytics aware compression adapts the encoder to emphasize high quality encoding of important frame regions (e.g., license plates) while de-emphasizing the quality or even dropping the background. For example, at right, the detected pedestrian regions can be compressed at a high resolution while, say, the grass can be transmitted at much lower resolution. Similarly, Analytics Aware Compression can reduce framerates or frame quality when there is little movement between frames. This same technique is used in current video codecs but an analytics aware codec can take this to a new extreme.

Toward a CANVAS Codec

To validate our ideas, we ran an initial CANVAS experiment combining compression aware object tracking with analytics aware region of interest (ROI) compression in a pedestrian detection application. Our goal was to see if CANVAS techniques could appreciably reduce transmission and computation resource requirements while retaining object classification accuracy. Our approach is shown in the figure below. In a typical edge-to-cloud environment, we created a new “edge analytics encoder” that identified objects in a camera feed, encoded those ROIs as high fidelity i-frames and combined them with the original motion vectors to create an analytics-optimized H.264 stream. Background information and p-frames were not transmitted. At the cloud, our decoder extracted and reconstructed a sequence of ROI frames. These were run through a FastRCNN object classifier to find the pedestrians.

The basic flow of a video through the systems is:

The edge decodes an incoming camera stream into an i-Frame sequence
These i-Frames run through a simple object detector to identify ROIs
A bounding box tracker computes the ROI paths through the video
The CANVAS encoder compresses the motion vectors and ROIs and streams to the Cloud
At the cloud, CANVAS decodes the stream into i-Frame images
The decoded images run through an object classifier and an object track is produced from bitstream motion vectors
The cloud feeds the classification results back to the edge to inform the object tracker
The cloud classifier outputs the object class and track

We ran this codec against a set of pedestrian videos on an Intel? CoreTM i7-6770HQ processor using Intel? Quick Sync Video for video decode and the Intel? Movidius? Myriad? X VPU and Intel? Distribution of OpenVINO? Toolkit for object detection and classification. Compared with a baseline RCNN object detector on a high fidelity video sequence, we were able to see several orders of magnitude improvement in bitrate and computational complexity with only minor impact to detection accuracy. Further details will be published soon. In some cases, like videos with fast moving objects, we actually saw accuracy increases.

Conclusion

These are early results but we’re very encouraged. We believe that an analytics optimized codec can lead to improved application performance at the expense of human viewability. We continue research in this area while we explore whether there is an industry need for such a technology. We’re interested in connecting with industry and academic technical leaders who have ideas in compression aware analytics and analytics aware compression. Please reach out to Dr. Omesh Tickoo if you’d like to collaborate.

References

Check out other blogs from my visual cloud series:

Overcoming Visual Analysis Paralysis -- Scanner, Spark, VDMS, Pandas and Apache Arrow (Oct 2019)
Sharing the Video Edge -- Mainstream and FilterForward (Apr 2019)
Feeling a Little Edgy --OpenRTIST (Mar 2019)
The Loneliness of the Expert -- Eureka (Mar 2019)
Visual Data: Pack Rat to Explorer -- VDMS (Feb 2019)
Scaling the Big Video Data Mountain -- Scanner (Jan 2019)

Mahmoud Al-Daccak

5 年

A very interesting area of research. Thank you for sharing. I’ll be following for sure!

要查看或添加评论，请登录

Jim Blakley的更多文章

A New Home for My Articles

2022年4月19日

A New Home for My Articles

Hi Readers, For the last couple of years, I've been reposting my articles from the Open Edge Computing Initiative (OEC)…
The Quest for Lower Latency: Announcing the New Living Edge Lab Wireless Network

2021年7月28日

The Quest for Lower Latency: Announcing the New Living Edge Lab Wireless Network

Note: This blog is a republication of the original at The Open Edge Computing Initiative, https://bit.ly/3f7CWgh In…

1 条评论
The Waiting is the Hardest Part

2021年4月27日

The Waiting is the Hardest Part

By Jim Blakley -- Associate Director, Living Edge Lab, Carnegie Mellon University Note: This blog is a repost of the…
Connecting the Dots at the Edges

2021年3月3日

Connecting the Dots at the Edges

By Jim Blakley -- Associate Director, Living Edge Lab, Carnegie Mellon University Note: This blog is a repost of the…
Edge-Native App Design When You Can’t See Behind the Curtain

2021年1月19日

Edge-Native App Design When You Can’t See Behind the Curtain

By Jim Blakley, Living Edge Lab @ Carnegie Mellon University This blog is a reposting of my blog here. In the…

6 条评论
Putting It All Together With Cognitive Assistance

2020年9月24日

Putting It All Together With Cognitive Assistance

By Jim Blakley, Living Edge Lab Associate Director, Carnegie Mellon University This blog also appears on the Open Edge…

1 条评论
When the Trainer Can’t Be There

2020年8月26日

When the Trainer Can’t Be There

By Jim Blakley, Associate Director Carnegie Mellon University Living Edge Lab This blog also appears on the Open Edge…
Seeing Further Down the Visual Cloud Road

2019年12月31日

Seeing Further Down the Visual Cloud Road

Written by Jim Blakley| December 4, 2019 (Note: This is a repost of my Intel Blog) Almost three years ago, Carnegie…

1 条评论
Overcoming Visual Analysis Paralysis

2019年10月28日

Overcoming Visual Analysis Paralysis

Written by Jim Blakley| October 25, 2019 (This is a repost of my blog on the Intel IT Peer Network) An Intel marketing…
Sharing the Video Edge

2019年4月29日

Sharing the Video Edge

You may recognize the images at right as Bell System photos from the turn of the 20th century. They are real examples…

2 条评论

See all articles

For Droid Eyes Only

Jim Blakley

Living Edge Lab Associate Director (retired) at Carnegie Mellon University

The Right Tool for a Different Job

Empowering Tomorrow’s Droids with Today’s Codecs

Designing the Perfect Droid Codec

Toward a CANVAS Codec

Conclusion

References

Jim Blakley的更多文章

社区洞察

其他会员也浏览了

Apple’s Scary Fast Event Recap

CES proved processing giants now run the tech world

Portwell Launches Next-Gen AIoT Edge Computing Solutions Featuring Intel? Core? Ultra Processors (Series 2)

Intel unveils new AI chips as it seeks to take on Nvidia and AMD

Highlights of Computex 2024: The Fusion of Innovative Technology and AI

Excel at the Edge with the MXM-ACMA-PUC

FlySight announces the collaboration with Aitech Systems and E4 Computer Engineering

Portwell Delivers Energy-Efficient Edge AI IPC Featuring Hailo-8R AI Accelerator Module

Watch and download presentations from the Q3 Memory Fabric Forum

Lunar Lake: Intel's Copilot+ Moment

The Right Tool for a Different Job

Empowering Tomorrow’s Droids with Today’s Codecs

Designing the Perfect Droid Codec

Toward a CANVAS Codec

Conclusion

References

Jim Blakley的更多文章

A New Home for My Articles

The Quest for Lower Latency: Announcing the New Living Edge Lab Wireless Network

The Waiting is the Hardest Part

Connecting the Dots at the Edges

Edge-Native App Design When You Can’t See Behind the Curtain

Putting It All Together With Cognitive Assistance

When the Trainer Can’t Be There

Seeing Further Down the Visual Cloud Road

Overcoming Visual Analysis Paralysis

Sharing the Video Edge

社区洞察

其他会员也浏览了

Apple’s Scary Fast Event Recap

CES proved processing giants now run the tech world

Portwell Launches Next-Gen AIoT Edge Computing Solutions Featuring Intel? Core? Ultra Processors (Series 2)

Intel unveils new AI chips as it seeks to take on Nvidia and AMD

Highlights of Computex 2024: The Fusion of Innovative Technology and AI

Excel at the Edge with the MXM-ACMA-PUC

FlySight announces the collaboration with Aitech Systems and E4 Computer Engineering

Portwell Delivers Energy-Efficient Edge AI IPC Featuring Hailo-8R AI Accelerator Module

Watch and download presentations from the Q3 Memory Fabric Forum

Lunar Lake: Intel's Copilot+ Moment