Automatic LiDAR Classification Made Easy with CloudCompare (No Coding)
Toronto 3D dataset rendered in CloudCompare

Automatic LiDAR Classification Made Easy with CloudCompare (No Coding)


My thoughts are with those affected by the disastrous earthquake that struck Morocco on 08 September 2023.


From the 3D GeoInfo conference in Munich, here is my 9th newsletter article. In this article about Random Forest classification in CloudCompare, you can expect to read more about:

  1. An introduction to supervised machine learning.
  2. Data Preparation and Feature Extraction.
  3. How the 3DMASC Plugin Works?
  4. Model Training, Evaluation, Prediction on unseen Data.


The latest version of CloudCompare includes a new plugin called 3DMASC that lets you train and classify your point clouds using Random Forests. I provide all the links to the documentation so as not to reproduce it, but I do provide my videos so that you can follow the practice. So, to put things into context and ensure a smooth read, I'll start with the definitions - scene understanding, semantic segmentation, classification, machine learning, random forest...

1. Why Are We Interested in Scene Understanding?

In the realm of data analysis, scene understanding is a pivotal endeavor, particularly within the context of point clouds. This pursuit holds immense value in both remote sensing (RS) and computer vision (CV). By comprehending scenes at a granular point level, we empower machines to discern objects, contexts, and relationships with detail. Semantic segmentation (for CV community) and classification ( for RS community) become essential tools in this endeavor.

LiDAR data classification involves categorizing individual LiDAR points into distinct classes based on their characteristics. These classes could represent terrain, buildings, vegetation, and more. Machine learning enhances this process by automating the classification based on learned patterns.

In the realm of 3D point cloud semantic segmentation, three distinctive approaches have emerged: model-driven, data-driven, and knowledge-based methods (video 01). Model-driven techniques rely on predefined models or templates to classify objects based on their fit to known shapes (planes, spheres, torus), while data-driven methods use machine learning to autonomously learn patterns from the point cloud data's features. Knowledge-based approaches incorporate domain-specific expertise and rule-based systems for segmentation, I tried to make a summary of those methods in the figure below.

Point cloud classification overview

The choice of method hinges on the specific task and data complexity; model-driven excels with well-defined shapes, data-driven handles diverse scenes but needs ample training data, and knowledge-based methods leverage expert insights. These approaches collectively empower scene understanding in the intricate world of 3D point clouds (video 02).

Supervised learning serves as a fundamental approach (as a data-driven method), relying on labeled data to train algorithms for pattern recognition within the point cloud (video 03). A powerful ally within supervised learning is ensemble learning. This ensemble method harnesses the collective wisdom of numerous decision trees, each trained on different data subsets, to improve classification accuracy and stability (video 04).

Mains types of machine learning

2. Classification using radiometric, spectral and geometric features

In line with the principles of machine learning classification, the process of classifying 3D point cloud data hinges on the extraction of relevant features surrounding individual points within their respective neighborhoods (overview of the workflow below). This methodology bears similarities to general classification tasks, wherein the selection of pertinent features plays a pivotal role in achieving accurate results. While taking inspiration from research efforts, which focuses on associating class labels with 3D points, it's essential to note that the quality of classification outcomes can be influenced by several factors.

Neighbor recovery --> feature extraction --> classification. @M. Weinmann, B. Jutzi, C. Mallet, and M. Weinmann

One critical factor relates to the definition of neighborhoods for each point in a 3D point cloud. Various strategies, including spherical and cylindrical neighborhoods, as well as combinations of these, have been employed. The choice of neighborhood type and the determination of the scale parameter, whether through prior knowledge or data-driven approach, can significantly impact classification results.

Once neighborhoods are established, the extraction of geometric features becomes the next step. These features capture the spatial arrangements of neighboring points and provide critical information for classification. These features range from local 3D shape features, based on eigenvalues, to height-related properties, curvature measures, and more. The choice of features is crucial, as it directly affects the classifier's ability to differentiate between classes.

Subsequently, the extracted features are fed into a classifier. While various classifiers are available, the selection often depends on the specific problem and its computational demands. Random Forest classifiers, known for their balance between accuracy and efficiency, are commonly employed in point cloud classification tasks.

However, the relevance of individual features cannot be overlooked. Not all features are equally valuable for classification. The Hughes phenomenon underscores the importance of feature selection, as high-dimensional data can lead to a decrease in classification accuracy. Various feature relevance assessment methods, including filter-based techniques, help identify the most pertinent features for the task. These methods evaluate correlations, statistical dispersion, information gain, and other criteria to rank features by their relevance.

In summary, the process of classifying 3D point cloud data shares fundamental principles with general machine learning classification tasks. The selection of appropriate neighborhoods, the extraction of relevant geometric features, the choice of a suitable classifier, and the assessment of feature relevance all play crucial roles in achieving accurate and meaningful classification results.

To read more, I recommend you read this paper: GEOMETRIC FEATURES AND THEIR RELEVANCE FOR 3D POINT CLOUD CLASSIFICATION, 2017, by M. Weinmann, B. Jutzi, C. Mallet, and M. Weinmann

3. How 3DMASC Works?

CloudCompare, as versatile open-source software, helped users to harness machine learning for LiDAR data classification without delving into code. The new 3DMASC plugin within CloudCompare plays a pivotal role in this endeavor (video 05).

In the context of the 3DMASC plugin, random forests shine by handling multi-dimensional data and diverse feature sets (RGB, intensity, number of returns, return number, planarity, linearity, sphericity... etc.), making them the linchpin for precise and reliable 3D point cloud classification.

3DMASC is an advanced plugin for 3D point cloud classification using multiple attributes, scales and clouds. It is possible to use it with the CloudCompare GUI, but also with the command line.

The 3DMASC plugin offers a versatile approach to scene understanding. It accommodates multiple point clouds simultaneously, supporting bi-temporal and bi-spectral survey processing and handling various 3D point cloud types, including those from Structure from Motion (SfM) or LiDAR sources, be it airborne, terrestrial, or mobile.

The workflow begins with the computation of descriptive features extracted from the input point cloud(s). These features (radiometric, geometric, or spectral) serve as the basis for training a random forest model, a pivotal aspect of the plugin. The user retains control, defining the features, point clouds to process, neighborhood scale, and target classes.

3DMASC produces classified point clouds enriched with class prediction confidence information, allowing for the filtering of misclassified points. Additionally, the graphical user interface (GUI) provides a tool to visualize contributive features (features importance) and select them interactively for refined training. Upon training completion, the plugin generates a confusion matrix for analysis (with F1 score, precision, and overall accuracy).

For large datasets and command-line use, 3DMASC proves to be a robust tool for multiscale feature computation. Features can be processed and managed with custom scripts or the Python tools available in the lidar platform. The supported feature set encompasses classical geometric attributes, dimensionality-based features, height-related, lidar echoes-based, and spectral attributes (see for details). These features are further enhanced through statistical operators and contextual comparisons across multiple surveys.


For more on the theory, please refer to the original paper: 3DMASC: Accessible, explainable 3D point clouds classification. Application to Bi-Spectral Topo-Bathymetric lidar data.

4. Step-by-Step Guide

With that reminder of the theory, let's move on to the practical part, the part that will be most useful to you. Below, I've broken down the whole process into 10 steps to simplify things (I can make others articles if needed), I also provide videos for demonstrations. Before you can put your model into production, you need to train it with the right features, test it, and save it. So you'll need annotated data with the classes you're interested in. In my case, I'm using the Toronto 3D dataset, which is a MMS dataset. But there are others, like the SUM acquired by photogrammetry, or the Sensat Ubran also by photogrammetry, or Paris-Lille-3D... etc.

1. Data Import: Start by importing your LiDAR data into CloudCompare (you can skipit, but it'simportant to see your data). This could be from aerial surveys, terrestrial scanning, or mobile LiDAR systems. For this first training stage, you need to have a labeled point cloud, with the attribute field "classification". There are two options, either to have two separate point clouds, one for training and one for testing and thus have the metrics. Or, a single point cloud with a percentage for training and the rest for testing.

2. Data Preprocessing: Clean your LiDAR data to remove noise and outliers, ensuring high-quality input for the classification process.

3. Feature Extraction: Identify the attributes or features that are relevant for classification. These could include elevation, intensity, and more intricate features. As a tip, to find out which features are the most important, you can calculate them in CloudCompare and see the visual result, which can give you an idea of whether or not the feature is useful. The other important parameter is the choice of scales. To do this, you need to know the density of your point cloud, but above all the dimensions of the elements you are interested in. For example, if you're interested in small objects, it's best to choose a small scale. The latter is the radius of the sphere that will be used for the neighborhood points that will be used to calculate the geometric characteristics.

4. 3DMASC Plugin: Access the 3DMASC plugin (you need to have the latest version installed). This plugin seamlessly integrates machine learning capabilities into CloudCompare's interface. You will have two options, train and classify. The first is for training and testing your model, and the second is for classification.

5. Parameter Setup: Define the core points, scales, and features in the parameter file (here is an example provided by the authors). This setup guides the classification process (video 06). Yout configuration file (.txt) need to be like this structure:

# NAMES: the labels, which are references to point clouds,

cloud: PC1=

cloud: PCX=

# CORE POINTS: the points on which the attributes will be calculated,

core_points: PCX

# SCALES: the scales at which the attributes will be calculated,

scales: 1;2;3; # scales as a list or you can use scales: 1:0.5:5 # scales as a range

# FEATURES: the features which will be computed.

feature: INT_SC0_PCX

6. Core Point Assignment: Assign core points that will be used as the foundation for attribute calculation and classification.

7. Scale Specification: Specify the scales for feature computation. This includes determining the granularity of analysis across different levels.

8. Feature Selection: Choose the features that will contribute to the classification process, whether they're point features, neighborhood features, or context-based features.

After loading the configuration file, here the window you will have. The first parameters are about the random trees, the the ratio of test data. If you want to keep the calculated attributes and traces, check the box. Finally you have the features to be calculated and the corresponding scales.

3D MASC Interface for Training and Testing

9. Random Forest Training: Let the 3DMASC plugin employ the power of random forest algorithms for automated classification based on the selected features (video 07). This step takes time, just be patient. Once finished, you'll have the confusion matrix window, with F1 score, class accuracy and overall accuracy.

10. Result Evaluation: Once the classification is complete, assess the results and evaluate the accuracy of the classification using visualization tools within CloudCompare.

If you're satisfied with the results: quantitative ones according to metrics and confusion matrices, or quantitatives ones just by looking visually, then your trained model will be saved. In addition to the model, there's a file containing the features used in training, which 3DMASC will need to calculate for every point cloud to be classified. For example, if you've used intensity, make sure your point cloud to be classified also contains it, otherwise you'll get an error.

Please refer to the videos of each step and to the paper if you didn't understand something, or ask in the comments section.

Benefits of Code-Free Approach:

  • Accessibility: The no-code approach allows professionals from various backgrounds to engage in LiDAR data classification without the barrier of coding expertise.
  • Efficiency: CloudCompare's intuitive interface and the 3DMASC plugin streamline the process, enabling efficient machine learning-based classification.
  • Iterative Refinement: Visualizing classification results empowers users to iteratively refine features and enhance accuracy.

Practically speaking, when considering the utility of this tool for classification purposes, the advantage becomes evident in its ability to eliminate the need for repetitive code manipulation or frequent retraining. Instead, the tool offers the prospect of establishing a robust model through a one-time training process, which can then be consistently applied whenever required. Nevertheless, a critical challenge emerges when attempting to generalize this model to data that significantly deviates from the training dataset, exhibiting variations in density or context. In such cases, there exists no magic solution to seamlessly adapt to vastly different contexts. Another intricate aspect pertains to the selection of pertinent features for classification. To address this, it becomes imperative to undergo training and subsequently assess the importance of specific features that contribute most effectively to class differentiation.

5. Conclusion

LiDAR data classification powered by machine learning is no longer confined to the realm of programmers. With CloudCompare's 3DMASC plugin, the landscape of LiDAR data analysis is democratized, making it accessible to experts from diverse domains. Embrace the code-free approach, and propel your LiDAR data classification to new heights of accuracy and efficiency.

Enjoy and share.

References

  1. Original paper
  2. Website
  3. My videos

要查看或添加评论,请登录

社区洞察

其他会员也浏览了