A detailed understanding about Crowd Counting using CNN.
Shashank V Raghavan
Engineer??| Product and Program Management ??| Resident Robot Geek ?? | Autonomous systems???| Quantum Computing ??
Crowd counting refers to the technique used to estimate the number of people in an image or video. It finds applications in various industries, hospitals, crowd gatherings, as well as automated public monitoring like surveillance and traffic control. Unlike object detection, crowd counting focuses on identifying arbitrarily sized targets in different scenarios, including sparse and cluttered scenes simultaneously.
Crowd counting tasks can be broadly categorized into two types:
Sparse crowd counting is relatively easier compared to counting dense crowds, which requires more sophisticated algorithms.
Techniques
Several techniques have been developed to address the challenges of crowd counting. Initially, computer scientists employed basic machine learning and computer vision algorithms such as detection, regression, and density-based approaches to predict crowd density and density maps. However, these methods faced challenges such as scale and perspective variations, occlusions, non-uniform density, and more. Subsequently, researchers turned their attention to Convolutional Neural Networks (CNNs) due to their effectiveness in various computer vision tasks, aiming to leverage their capabilities in developing crowd counting algorithms.
Counting by Detection
This approach focuses on counting by detecting individual objects, specifically people. There are three types of crowd counting by detection, based on the features used to identify crowds in images and videos:
Disadvantage: However, counting by detection is not highly accurate when dealing with dense crowds and significant background clutter.
领英推荐
Counting by Regression
Counting by regression does not involve segmentation or tracking of individuals, but focuses on learning a mapping between image features to the number of individuals, which result in better performance, specially with dense crowds. Depending on the regression goals, this crowd counting method can be divided into two groups:
Disadvantage: Although these techniques get better performance in both sparse and dense scenarios and alleviate the dependency on the detector, they still rely heavily on handcrafted features. As a result, the feature extraction algorithm became an essential limitation for regression-based methods.
Counting using CNN
The powerful feature extraction capabilities of CNNs in deep learning, it can be used to automatically extract features and train an end-to-end network to count individuals. The methods can adapt to changes in various factors, predict the number of individuals more accurately and achieve the state of the art on many popular evaluation benchmarks. CNN-based methods outperform other approaches in scenarios involving a wide range of human head scales, non-uniform density distributions, and significant variations in perspective and scene.
Although this technique gets better performance in both sparse and dense scenarios and alleviates the dependency on the detector, it still relies heavily on handcrafted features. As a result, the feature extraction algorithm became an essential limitation for regression-based methods.