Collection of LiDAR data
Shashank V Raghavan??
Artificial Intelligence?| Autonomous Systems??| Resident Robot Geek??| Quantum Computing??| Product and Program Management??
When comparing LiDAR data with other modalities like images, videos, audio, or text, we see that there is a very small number of LiDAR datasets that are publicly available, the reason behind this issue is the difficulty of both the acquisition and annotation process of LiDAR data. Because even high-end LiDAR systems provide sparse and colorless point cloud data, one adds also sensors of other modalities (usually cameras) when collecting LiDAR data.
The curse of sparsity can make the appearance of very different but far from the system objects very similar. For example, a person standing close to the edge of the sensing radius of a LiDAR system can be indistinguishable from a small tree for a human annotator. High-quality images from the cameras accompanying the LiDAR systems solve this issue.
The described solution looks simple and easy to do, but in reality, the calibration of these sensors together is a tedious task. The so-called sensor fusion is achieved by ensuring a precise and fixed position of the sensors, then using those positions, sensor-specific parameters - called intrinsics, undergo complex alignment routines which usually involve big printed chessboard images placed at various angles relative to the sensors.
As a result, one obtains an extrinsic matrix which is used in combination with camera intrinsic matrices to project points from the point cloud to pixel space and vice versa. This two-way connection between sensors not only enables an easier process of annotation but also the usage of state-of-the-art image understanding models. These models detect or segment the objects captured with the camera and project the labels to the 3D point clouds captured by the LiDAR system, and by doing so, it either completely does the lidar labeling job or greatly augments it.
Uses of deep learning with LiDAR data
Given the type of output that LiDAR systems generate, combining them with neural networks seems like a natural fit, and indeed neural networks operating on point clouds have proven effective. We can apply deep neural networks to LiDAR data for understanding classification, and semantic segmentation. A U-Net like architecture can be used to operate directly on point clouds and demonstrated how it is possible to get superior results than the ones obtained with image-based models.
There are applications in increasingly complex tasks in the domain, such as instance segmentation, object detection, object completion, pose estimation, etc. Eventually, this progress made it possible to create the very first generative 3D model released by OpenAI in December 2022, called Point-E.
领英推荐
Challenges of neural networks with LiDAR
Four interesting families of architectures proposed to deal with LiDAR data as follows:
1)?Point cloud-based methods:?These networks operate directly on the point clouds using different approaches. One such?approach?is learning the spatial features of each point directly via MLPs and accumulating them via max-pooling.
2)?Voxel-based methods:?The 3D data is divided into a 3D grid of voxels (essentially a grid of cubes), and?3D convolution and pooling are applied in a?CNN-like architecture.
3)?Graph-based methods:?These methods use the inherent geometry present in point clouds to construct graphs out of them and apply common GNN architectures like graph?CNNs and graph attention networks?(which also happen to satisfy the earlier mentioned condition of permutation invariance).
4)?View-based methods:?These methods rely on creating 2D projections of the point clouds using the tried and tested architectures from 2D computer vision. In this case, a tactic that can help improve model performance is to create multiple projections from different angles and vote for a final prediction.