How to do Face detection with dlib (HOG and CNN)
Tejas Shastrakar
Masters Student Mechatronics and Robotics | ADAS and Computer Vision Enthusiast | Ex TCSer | Machine Learning | Deep Learning
Dlib is an open-source software library primarily written in C++, with Python bindings available. It provides a wide range of tools and algorithms for various machine learning, computer vision, and image processing tasks. Developed by Davis King, Dlib is widely known for its efficiency, portability, and ease of use. Face detection with dlib involves the use of two main methods: Histogram of Oriented Gradients (HOG) and Convolutional Neural Networks (CNN).
Introduction to HOG and CNN:
1. Histogram of Oriented Gradients (HOG): HOG is a feature descriptor technique used for object detection in computer vision. It works by calculating the distribution of gradient orientations in localized portions of an image. HOG breaks down an image into small, overlapping cells, computes histograms of gradient orientations within each cell, and then normalizes these histograms. The resulting feature vector represents the distribution of gradient orientations in the image, providing meaningful information about the local object shape and texture. HOG has been widely used in pedestrian detection, face detection, and other object recognition tasks.
2. Convolutional Neural Networks (CNN): CNNs are deep learning models specifically designed to process structured grids of data, such as images. They consist of multiple layers, including convolutional layers, pooling layers, and fully connected layers. CNNs automatically learn hierarchical patterns and features from raw pixel data, enabling them to effectively extract relevant features for various tasks, including face detection. CNNs have achieved remarkable success in computer vision tasks, surpassing traditional feature-based approaches in many cases.
Working Principle for Face Detection:
1. HOG-based Face Detection:
- Feature Extraction: Initially, the image is divided into small, overlapping cells, and gradient magnitudes and orientations are computed for each pixel within these cells.
- Histogram Calculation: Histograms of gradient orientations are constructed for each cell.
- Normalization: The histograms are normalized within blocks to ensure invariance to changes in lighting and contrast.
- Sliding Window Detection: A sliding window technique is employed to scan the entire image, applying the HOG descriptor to each window. At each position, a classifier (typically a linear SVM) is used to determine whether the window contains a face or not.
- Post-processing: Detected face regions may undergo additional refinement steps, such as non-maximum suppression to merge overlapping detections.
2. CNN-based Face Detection:
- Training: A CNN model is trained on a large dataset of labeled face images. During training, the CNN learns to automatically extract relevant features for face detection from raw pixel data.
- Feature Extraction: The input image is fed into the CNN, and feature maps are computed through convolutional and pooling layers.
- Detection: The final layers of the CNN typically consist of fully connected layers followed by softmax or sigmoid activation functions, which output the probability of the presence of a face in various regions of the image.
- Non-maximum Suppression: Similar to HOG-based detection, post-processing steps like non-maximum suppression may be applied to refine the detected face regions and eliminate overlapping detections.
Code:
import dlib
import cv2
##Detecting faces with HOG (Histogram of Oriented Gradients)
image = cv2.imread('Images/istockphoto-1362120018-612x612.jpg')
#To Display image
if image is not None:
# Display the image
cv2.imshow('image', image)
# Wait for a key press and close the window
cv2.waitKey(0)
cv2.destroyAllWindows()
else:
print("Error: Unable to read the image.")
领英推荐
face_detector_hog = dlib.get_frontal_face_detector()
detections = face_detector_hog(image, 1)
for face in detections:
#print(face)
#print(face.left())
#print(face.top())
#print(face.right())
#print(face.bottom())
l, t, r, b = face.left(), face.top(), face.right(), face.bottom()
cv2.rectangle(image, (l, t), (r, b), (0, 255, 0), 2)
#To Display image
if image is not None:
# Display the image
cv2.imshow('image', image)
# Wait for a key press and close the window
cv2.waitKey(0)
cv2.destroyAllWindows()
else:
print("Error: Unable to read the image.")
##Detecting faces with CNN (Convolutional Neural Networks)
image = cv2.imread('Images/istockphoto-1362120018-612x612.jpg')
cnn_detector = dlib.cnn_face_detection_model_v1('Weights/mmod_human_face_detector.dat')
detections = cnn_detector(image, 1)
for face in detections:
l, t, r, b, c = face.rect.left(), face.rect.top(), face.rect.right(), face.rect.bottom(), face.confidence
print(c)
cv2.rectangle(image, (l, t), (r, b), (255, 255, 0), 2)
#To Display image
if image is not None:
# Display the image
cv2.imshow('image', image)
# Wait for a key press and close the window
cv2.waitKey(0)
cv2.destroyAllWindows()
else:
print("Error: Unable to read the image.")
In summary, both HOG and CNN approaches aim to detect faces in images, but they differ in their feature extraction and classification methodologies. HOG relies on handcrafted features and traditional machine learning classifiers, while CNNs automatically learn features from data, making them more adaptable to different datasets and potentially achieving higher accuracy. Dlib provides implementations for both methods, allowing users to choose the one that best suits their requirements and constraints.
Data Science, Machine/Deep learning enthusiastic || Looking for collaborations
2 个月Shouldn't face frontalization be performed prior to applying dlib?