ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Enhancement of Object Detection with Depth Estimation

Yannick Klose

Software Engineer @ Apple

å‘å¸ƒæ—¥æœŸ: 2020å¹´9æœˆ12æ—¥

Today, the growing challenges of the 21st century are being met in particular by ever faster, more efficient and above all automated processes. An important task to be solved in this context is object recognition by deep neural networks, which is already used in many fields such as medical technology, document analysis or in automated production processes. With steadily increasing computing power over the last few years, object recognition has also found increasing application in the automotive industry, where it supports e.g. the recognition of vehicles and other objects such as pedestrians. It is of utmost priority that the object detection works consistently and precisely even under difficult conditions. Different weather conditions, fast driving manoeuvres or optical illusions caused by reflections, for example, must not impair detection.

In this article I would like to present a method how object recognition can be improved with Multi-Task-Learning by looking at specific difficult conditions. What does this mean in concrete terms? Multi-Task-Learning generally describes the training of several tasks in a single training process. In our case in addition to object detection, a depth estimation based on the camera data is trained and evaluated during the training with LiDAR data. This means that LiDAR data is only needed in training and not for prediction.

Es wurde kein Alt-Text fÃ¼r dieses Bild angegeben.

A brief theory: The basis for camera-based object recognition is the classic Faster-RCNN architecture. This architecture consists of 3 basic modules. In the first module, the backbone network, so-called features are extracted from the image data. In the second module, the Region Proposal Network, the search radius for finding objects is narrowed down. For this purpose, potential object boxes are extracted using anchor points. These potential object boxes are then classified in the last module, the Box Head. The loss is then added up and backpropagation is initiated.

In order to be able to carry out an additional depth estimation on the basis of this architecture, the Faster-RCNN architecture is extended accordingly by the green-colored areas. The first important step is to create a depth map from the LiDAR data representing point clouds. In general, interpolation methods are best suited for this. However, there are isolated problems in the peripheral areas, which is why a specially developed interpolation algorithm is used in the following.

With the help of this depth map, the LiDAR module can now be developed. The LiDAR module uses the features created by the backbone to reduce the channels to 1 by means of convolutions. Then the depth prediction is multiplied by the interpolation mask in binary form so that only those pixels are evaluated that have valid values in the interpolation. Afterwards the loss for the depth estimation is added to the loss for the object detection. This is the crucial step in which the single task optimisation problem becomes a multi task optimisation problem.

So what effect does Multi-Task-Learning have on object recognition? The answer, as so often, is: It depends. Basically, LiDAR has the property of having a very limited range. This range is approximately 75m. Therefore all objects that are further than 75m away cannot gain any added value from Multi-Task-Learning. In addition, objects that are very close are usually very easy to classify. Nevertheless, there are situations in which Multi-Task-Learning has an advantage in object recognition.

These cases refer to sequences in which visibility is restricted and all objects are within the range to be detected by LiDAR. This is clearly shown in the picture above. While the Faster RCNN architecture identifies very few objects, the Multi-Task-Learning architecture shows a significant improvement through depth estimation and is able to clearly identify even very small objects such as the person behind the vehicle in the right-hand edge of the image.

So what can you take with you? Multi-Task-Learning is absolutely no guarantee that individual tasks can be improved. What is decisive is a close examination of the problem. In this case Multi-Task-Learning does not provide a general improvement, but is able to improve corner-cases such as dark sequences with close objects.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Yannick Kloseçš„æ›´å¤šæ–‡ç«

Pattern Tracking - Machine Learning

2019å¹´6æœˆ30æ—¥

Pattern Tracking - Machine Learning

Over the last few years, the computing power of GPUâ€™s has increased rapidly enabling us to compute complex operationsâ€¦

2 æ¡è¯„è®º

Enhancement of Object Detection with Depth Estimation

Yannick Klose

Software Engineer @ Apple

Yannick Kloseçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

AI and Machine Learning: Revolutionizing the Aerospace Industry

The Eye of AI #6

Navigating the Future of Drone-Based Detection: Post-Processing Versus Real-Time Insight Solutions

From Science Fiction to Reality â€“ How Artificial Intelligence is Transforming Our World and Shaping the Future

PRESS RELEASE Evaluation of Camouflage, Concealment, and Deception systems in the AI era

Fakes, Fins, and Fast Lanes

Deep Block now offers change detection for remote sensing imagery.

Proving Pointlyâ€™s scalability: Classifying an entire major city

Deep Block lands its first PoC with the city of Seoul.

Yannick Kloseçš„æ›´å¤šæ–‡ç«

Pattern Tracking - Machine Learning

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

AI and Machine Learning: Revolutionizing the Aerospace Industry

The Eye of AI #6

Navigating the Future of Drone-Based Detection: Post-Processing Versus Real-Time Insight Solutions

From Science Fiction to Reality â€“ How Artificial Intelligence is Transforming Our World and Shaping the Future

PRESS RELEASE Evaluation of Camouflage, Concealment, and Deception systems in the AI era

Fakes, Fins, and Fast Lanes

Deep Block now offers change detection for remote sensing imagery.

Proving Pointlyâ€™s scalability: Classifying an entire major city

Deep Block lands its first PoC with the city of Seoul.

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†