登录查看更多内容

Our New Paper: End-to-End Multitask Learning for Driver Gaze and Head Pose Estimation

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

发布日期: 2020年3月19日

80% of crashes involve driver distraction

Paper: "End-to-End Multitask Learning for Driver Gaze and Head Pose Estimation"; Electronic Imaging (EI 2020), Society for Imaging Science and Technology.

Abstract

Modern automobiles accidents occur mostly due to inattentive behavior of drivers, which is why driver’s gaze estimation is becoming a critical component in automotive industry. Gaze estimation has introduced many challenges due to the nature of the surrounding environment like changes in illumination, or driver’s head motion, partial face occlusion, or wearing eye decorations.

Most of previous work conducted in this field includes explicit extraction of hand-crafted features such as eye corners and pupil center to be used to estimate gaze, or appearance-based methods like Convolutional Neural Networks which implicitly extracts features from an image and directly map it to the corresponding gaze angle.

In this work, a multitask Convolutional Neural Network architecture is proposed to predict subject’s gaze yaw and pitch angles, along with the head pose as an auxiliary task, making the model robust to head pose variations, without needing any complex preprocessing or hand-crafted feature extraction.

The model achieves 78.2% accuracy in cross-subject testing (never seen in any pose), proving the model’s generalization capability and robustness to head pose variation.

Challenges facing driver gaze estimation

Person independence: The ability to generalize on any subject.
Variation in head pose: The ability to accurately detect gaze regardless of the orientation of the head.
Subject wearing eye decorations eg. glasses

An End-to-End solution to driver’s gaze estimation using a single Convolutional Neural Network (CNN).

End-to-End Multitask learning Network for driver Gaze and Head pose detection.
No need for explicit feature extraction or multiple networks.
Train a deep network to predict the subject’s head pose angle as an auxiliary task.
The network being regression-based i.e. outputs gaze angles as continuous values, enables it to learn the spatial relation between gaze points, which is something a classification approach would fail to do.
We cluster the predicted gaze values into classes, which is relevant in the driving scenario.

Dataset

Columbia Gaze: Consists of 5880 high-resolution images (total 56 subjects)
5 horizontal head poses (0?,±15?,±30?), 7 horizontal gaze directions (0?,±5?,±10?,±15?) and 3 vertical gaze directions (0?,±10?)
Dataset consists of a total of 21 gaze classes and 5 head poses, yielding 21 x 5 different images per one subject.

Setup and Pre-Processing

Excluded 6 subjects from the dataset to be used for cross-subject testing.
Initialized the weights of the first 4 layers in our architecture using the pre-trained weights of VGG face descriptor model.
Data Augmented: random contrast, brightness, gaussian noise, etc ….; no affine transformations.
Clustered the 21 Gaze points into 9 classes for practical reasons and to simplify the task

Experiments

Exp.1: Simple Classification

Classified gaze into one of 9 regions/classes.
No head pose aux. task

Exp.2: Simple Regression

Predicted gaze Yaw & Pitch angles.
Clustered the predicted values into 9 classes.
No head pose aux. task

Exp.3: Feature Fusion (Regression)

Pretrained a separate network to predict head pose.
Used as a feature extractor and concatenated its resulting feature vectors with the feature vectors of the gaze network during training.

Exp.4: Multitask Learning (Regression)

Predicted gaze Yaw & Pitch angles + head pose as an auxiliary task.

Results

It is clear that the multitask learning network achieved the best results even on subjects it has not seen before.

Saliency Maps

Visualizing what the network has learned using saliency maps, it is clear that the gaze detection network head focuses on the eye pupil, while the head pose detection network head focuses on the face contour and eyes.

Conclusions

We propose an End-to-End Multitask learning solution to gaze estimation using a use a single CNN

Two ways are utilized to enhance the accuracy of our method.

First, we use regression rather than classification approach. This comes from the fact that there is an underlying correlation between gaze regions that a pure classification approach would fail to capture.
Second, we use Multitask Learning where the network is trained to predict the subject's head pose angle as an auxiliary task along with its main task of predicting gaze.

Since the appearance of the eye varies with the head pose, training one network on both gaze and head pose estimation tasks simultaneously has proven to enhance the results.

Best Regards

Mostafa Nasser

Data Analytics Consultant | x2 AWS Certified | Msc.Student

5 年

Great work, Ibrahim Sobh - PhD

Nader Essam

CV/ML Engineer @ Anyline | MSc. @ Johannes Kepler Universit?t

5 年

Great Job Mahmoud Ewaisha, Ibrahim Sobh - PhD?:)?

1 次回应

Audrey Quessada Vial, Ph.D

Senior Data Scientist, conseils et formation

5 年

Congrats ! Nice work

1 次回应

查看更多评论

要查看或添加评论，请登录

Ibrahim Sobh - PhD的更多文章

The Evolution and Applications of Attention Mechanisms in Deep Learning: A Comprehensive Survey

2025年3月1日

The Evolution and Applications of Attention Mechanisms in Deep Learning: A Comprehensive Survey

Article created by Perplexity Deep Research. Prompt: "You are a deep-learning experienced researcher.

1 条评论
The Judicial Cognitive Process: From Case Inception to Judgment and the Promise of AI Augmentation

2025年3月1日

The Judicial Cognitive Process: From Case Inception to Judgment and the Promise of AI Augmentation

Research Report Created by Perplexity Deep Research My Research Question : "Now I want to dig deeper in the human judge…

3 条评论
How to Learn Artificial Intelligence: A Beginner’s Guide

2024年5月31日

How to Learn Artificial Intelligence: A Beginner’s Guide

Artificial Intelligence (AI) is a fascinating field that simulates human intelligence and task performance using…
[????????????] ?????????????????? ???????????? explained with code ??

2023年1月28日

[????????????] ?????????????????? ???????????? explained with code ??

"During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion…

2 条评论
A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

2023年1月21日

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

Hello everyone, and thank you all for being here today! Let me introduce our new star, the ChatGPT, who will discuss…
10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

2022年2月17日

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

In this article, 10 well-known pre-trained object detectors are loaded and used in a standard and easy way. YOLOF: You…

6 条评论
FNet: Do we need the attention layer at all? [Explained with code]

2021年10月30日

FNet: Do we need the attention layer at all? [Explained with code]

FNet: Mixing Tokens with Fourier Transforms "In this work, we investigate whether simpler token mixing mechanisms can…
Patches Are All You Need! [with code]

2021年10月28日

Patches Are All You Need! [with code]

"It is only a matter of time before Transformers become the dominant architecture for vision domains, just as they have…
MLP is all you need! [with code]

2021年10月23日

MLP is all you need! [with code]

From Google: MLP-Mixer: An all-MLP Architecture for Vision Main idea: "While convolutions and attention are both…

2 条评论
9 Steps for solving any machine learning problem

2021年8月28日

9 Steps for solving any machine learning problem

In this article, we will present a universal blueprint that we can use to attack and solve any machine-learning…

2 条评论

See all articles

Our New Paper: End-to-End Multitask Learning for Driver Gaze and Head Pose Estimation

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

Abstract

Challenges facing driver gaze estimation

An End-to-End solution to driver’s gaze estimation using a single Convolutional Neural Network (CNN).

Dataset

Setup and Pre-Processing

Experiments

Results

Saliency Maps

Conclusions

Ibrahim Sobh - PhD的更多文章

社区洞察

其他会员也浏览了

Information and controlling system

BxD Primer Series: Transformer Models

Introduction to Advanced Traffic Modeling with GPT & CTG++

Computer Vision Explained: A Visual Future

How KANs Rethink AI Problem-Solving

Neural networks in robotic vision and guidance

AI Atlas #6: Neural Radiance Fields (NeRFs)

Noisy by Nature: How AI Learns to Shush the Static

BxD Primer Series: Convolutional Inverse Graphic Neural Networks

Convolutional Neural Networks: Unraveling the Technology Powering Modern AI

Abstract

Challenges facing driver gaze estimation

An End-to-End solution to driver’s gaze estimation using a single Convolutional Neural Network (CNN).

Dataset

Setup and Pre-Processing

Experiments

Results

Saliency Maps

Conclusions

Ibrahim Sobh - PhD的更多文章

The Evolution and Applications of Attention Mechanisms in Deep Learning: A Comprehensive Survey

The Judicial Cognitive Process: From Case Inception to Judgment and the Promise of AI Augmentation

How to Learn Artificial Intelligence: A Beginner’s Guide

[????????????] ?????????????????? ???????????? explained with code ??

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

FNet: Do we need the attention layer at all? [Explained with code]

Patches Are All You Need! [with code]

MLP is all you need! [with code]

9 Steps for solving any machine learning problem

社区洞察

其他会员也浏览了

Information and controlling system

BxD Primer Series: Transformer Models

Introduction to Advanced Traffic Modeling with GPT & CTG++

Computer Vision Explained: A Visual Future

How KANs Rethink AI Problem-Solving

Neural networks in robotic vision and guidance

AI Atlas #6: Neural Radiance Fields (NeRFs)

Noisy by Nature: How AI Learns to Shush the Static

BxD Primer Series: Convolutional Inverse Graphic Neural Networks

Convolutional Neural Networks: Unraveling the Technology Powering Modern AI