登录查看更多内容

A detailed understanding about Crowd Counting using CNN.

Shashank V Raghavan

Engineer??| Product and Program Management ??| Resident Robot Geek ?? | Autonomous systems???| Quantum Computing ??

发布日期: 2024年8月4日

Crowd counting refers to the technique used to estimate the number of people in an image or video. It finds applications in various industries, hospitals, crowd gatherings, as well as automated public monitoring like surveillance and traffic control. Unlike object detection, crowd counting focuses on identifying arbitrarily sized targets in different scenarios, including sparse and cluttered scenes simultaneously.

Crowd counting tasks can be broadly categorized into two types:

Dense crowds: Occurs when a large number of people are densely packed in a specific area.
Sparse crowds: Refers to situations where people are scattered with significant gaps between them.

Sparse crowd counting is relatively easier compared to counting dense crowds, which requires more sophisticated algorithms.

Techniques

Several techniques have been developed to address the challenges of crowd counting. Initially, computer scientists employed basic machine learning and computer vision algorithms such as detection, regression, and density-based approaches to predict crowd density and density maps. However, these methods faced challenges such as scale and perspective variations, occlusions, non-uniform density, and more. Subsequently, researchers turned their attention to Convolutional Neural Networks (CNNs) due to their effectiveness in various computer vision tasks, aiming to leverage their capabilities in developing crowd counting algorithms.

Counting by Detection

This approach focuses on counting by detecting individual objects, specifically people. There are three types of crowd counting by detection, based on the features used to identify crowds in images and videos:

Integral-based Detection: This method uses the full-body appearance of people, extracting features like edges, shapelets, textures, Haar wavelets, histogram of oriented gradient (HOG), etc. Then, it uses learning approaches such as SVMs, boosting, random forests, clustering or other algorithms are employed to detect or classify objects and ergo count people.
Part-based Detection: Instead of considering the entire human body, this technique focuses on specific parts, such as the head or shoulders, and applies classifiers to those parts. Estimating the presence of a person solely based on the head is not reliable, so combining the head and shoulders provides better results, particularly for dense crowds.
Shape Matching: This method uses ellipses to draw boundaries around humans and then employs a stochastic process to estimate the number and configuration of shapes.

Disadvantage: However, counting by detection is not highly accurate when dealing with dense crowds and significant background clutter.

Data & Analytics 5 个月前

Human Intelligence versus Machine Intelligence

Jacques Ludik 1 年前

“Magical” Emergent Behaviours in AI: A Security…

Marin Ivezic 1 年前

Counting by Regression

Counting by regression does not involve segmentation or tracking of individuals, but focuses on learning a mapping between image features to the number of individuals, which result in better performance, specially with dense crowds. Depending on the regression goals, this crowd counting method can be divided into two groups:

Individual based regression: This technique extracts low-level features such as edge details and foreground pixels, and applies regression modeling to map these features to the count.
Density based regression: This approach focuses on estimating density by learning the mapping between local features and object density maps, effectively incorporating spatial information. So, it avoids the dependence on the detector by learning the mapping of images to density maps. Instead of learning each individual separately, this technique tracks groups of individuals simultaneously. The mapping can be "linear or nonlinear".

Disadvantage: Although these techniques get better performance in both sparse and dense scenarios and alleviate the dependency on the detector, they still rely heavily on handcrafted features. As a result, the feature extraction algorithm became an essential limitation for regression-based methods.

Counting using CNN

The powerful feature extraction capabilities of CNNs in deep learning, it can be used to automatically extract features and train an end-to-end network to count individuals. The methods can adapt to changes in various factors, predict the number of individuals more accurately and achieve the state of the art on many popular evaluation benchmarks. CNN-based methods outperform other approaches in scenarios involving a wide range of human head scales, non-uniform density distributions, and significant variations in perspective and scene.

Although this technique gets better performance in both sparse and dense scenarios and alleviates the dependency on the detector, it still relies heavily on handcrafted features. As a result, the feature extraction algorithm became an essential limitation for regression-based methods.

Multi-scale fusion: Methods such as MCNN, CrowdNet, and SaCNN focus on fusing features of different scales to handle varying head scales and crowd sizes. They use multi-column convolutional networks or scale-adaptive convolutional neural networks for feature extraction and fusion.
Attention-based: Approaches like MSAN, SCAR, and SFANet utilize attention mechanisms to address challenges such as changes in head scales and complex crowd scenes. Attention is used to guide the network to focus on important regions and improve counting accuracy.
Patch-based: Hydra-CNN, Switching CNN, and IG-CNN divide images into patches and count them separately, addressing uneven crowd density. They employ various techniques such as adaptive patch response, selective network branching, and incremental learning.
Multi-density map fusion: Methods like ASD and DecideNet fuse density maps of multiple scales or levels to handle varying conditions. They use weight information or adaptive calibration to combine density maps and improve counting accuracy.
GAN-based: GAN-based approaches, such as MS-GAN, leverage adversarial networks to generate more accurate density maps. Generative and discriminative models compete to understand the distribution of crowd data, leading to improved counting performance.
Context-based: CP-CNN and other context-based methods utilize contextual and semantic information to constrain density maps. They integrate global and local context information to generate high-quality density maps.
Coarse-to-fine: Coarse-to-fine approaches, including DRSAN and ic-CNN, initially generate a coarse density map and then refine it for finer counting results. They use recurrent spatial-aware networks or multi-stage fusion to enhance the density map quality.

A detailed understanding about Crowd Counting using CNN.

Shashank V Raghavan

Engineer??| Product and Program Management ??| Resident Robot Geek ?? | Autonomous systems???| Quantum Computing ??

Techniques

Counting by Detection

领英推荐

Counting by Regression

Counting using CNN

更多精彩文章

社区洞察

其他会员也浏览了

Probabilistic Nearest Neighbors: The Swiss Army Knife of GenAI

Binary Brains vs. Biological Minds: How Fundamental Architectures Define the Potential of AI and Human Intelligence and its Outcomes.

How to Improve Small Object Detection Accuracy Without Increasing Latency

The Emergence of Spatial Intelligence: How AI is Poised to Revolutionize Our Understanding of the Real World

Data Science in the Age of Sentient AI: Navigating the Symbiosis

Object Tracking in Computer Vision: An In-Depth Exploration and Practical Guide

Deep Stubborn Networks – A Breakthrough Advance Towards Adversarial Machine Intelligence

Hand Gesture Recognition using ML Algorithms

Simple, Elegant, Convincing, and Wrong: The fallacy of ‘Explainable AI’ and how to fix it, part 2

Techniques

Counting by Detection

领英推荐

Counting by Regression

Counting using CNN

Robotic Path Planning: RRT and RRT*

2024年11月6日

AI Data Collection Hardware - What is Required to run AI?

2024年10月27日

Path Planning Using A* Algorithm

2024年9月12日

Localization and Object Detection with Deep Learning and YOLO (Single shot detectors)

2024年9月2日

Dijkstra’s Algorithm for Mobile Robot Path Planning

2024年8月15日

Neural networks in robotic vision and guidance

2024年8月8日

How AI can improve the performance of Robotic Arms

2024年7月28日

Generative AI in Robotics

2024年7月4日

The Data Science Behind Self-Driving Cars

2024年6月24日

Quantum Sensing in Action

2024年6月9日

社区洞察

其他会员也浏览了

Probabilistic Nearest Neighbors: The Swiss Army Knife of GenAI

Binary Brains vs. Biological Minds: How Fundamental Architectures Define the Potential of AI and Human Intelligence and its Outcomes.

How to Improve Small Object Detection Accuracy Without Increasing Latency

The Emergence of Spatial Intelligence: How AI is Poised to Revolutionize Our Understanding of the Real World

Data Science in the Age of Sentient AI: Navigating the Symbiosis

Object Tracking in Computer Vision: An In-Depth Exploration and Practical Guide

Deep Stubborn Networks – A Breakthrough Advance Towards Adversarial Machine Intelligence

Hand Gesture Recognition using ML Algorithms

Simple, Elegant, Convincing, and Wrong: The fallacy of ‘Explainable AI’ and how to fix it, part 2