登录查看更多内容

Precision, Recall, F1-Score for Object Detection - Back to the ML Basics

Felix Friedmann

NVIDIA DriveOS, embedded LLM/VLM, DriveWorks

发布日期: 2020年11月19日

There are some topics that we come across again and again. As Christoph Petzinger, a fellow (fantastic) software engineer, and me realized when we were back at camera interfacing for the n-th time, it may feel like walking a full circle, although every time you'll be faster, better prepared.

Another topic like this, as mundane as it may seem, are Precision and Recall. They may seem trivial, an oversimplification even, and yet they are the very foundation of assessing object detection performance. Let's have a closer look at them.

Why Precision and Recall?

There is a plethora of metrics of classification performance, besides Precision and Recall there are specificity, negative predictive value, fall-out, etc.; but many of those reflect True Negatives for which an infinite amount exists in a detection task, hence they are not applicable.

Definitions

FP - False Positives: "Ghosts", number of issued predictions that were incorrect.
FN - False Negatives: "Misses", number of targets that should have been predicted, but were missed.
TP - True Positives: "Hits", correct predictions
TN - True Negatives: correct prediction of the absence of a class; not useful for detections tasks as in this context, an infinite amount of TNs exists.
Precision: ~"Prediction Reliability", share of the issued predictions that were correct.
Recall: "Hit Rate", share of the targets that were predicted correctly.
F1 Score: The harmonic mean between Precision and Recall, hence a metric reflecting both perspectives.

A closer look at some scenarios

The chart above shows Precision and Recall values for various scenarios to illustrate their respective characteristics.

Scenario A is a "Mixed Bag": Since it contains TPs, FNs and FPs, both Precision and Recall are "somewhere between" 0 an 100% and the F1 score provides a value between them (note how it differs from arithmetic mean) - overall a standard case.
Scenario B is an "Epic Win": There are only TPs, hence Precision and Recall are at 100%
Scenario C is a "Catastrophic Failure": There are no TPs, hence Precision and Recall are zero.
Scenario D is a "Ghost Town": There are TPs and FPs but no FNs, hence Precision is low while recall is high.
Scenario E is "Prediction Scarcity": There are TPs and FNs but no FPs, hence Precision is high and Recall is low.

Scenario D and E nicely illustrate that

Precision is a measure for efficiency - if few predictions are wasted, Precision is high, even if that means that only few targets are hit. It favors a targeted approach and may drive a model into frugality - better no predictions than a wrong ones.
Recall is a measure for effectiveness - the more targets are hit, the better, independent of the amount of wrong predictions made. It favors a spamming approach and may drive a model into prodigality - issuing a flood of predictions in the hope that some may hit the targets.
F1 Score reflects both Precision and Recall, making it a good aggregated indicator; but it abstracts away the source of the problem; for that we have to go back to Precision and Recall...

This is just the beginning

Admittedly, there is a degree of repetition in the above, but hopefully this clarifies the meaning of the two metrics once and for all; and demonstrates the value of the F1 score.

Precision and Recall are really the absolute basics; from this point there are many avenues to explore, the Confusion Matrix, Receiver Operating Characteristic (ROC) Curves, etc.

--- Previous Articles ---

Sample Size Determination for Data Quality Checks

How much data needs to be checked to reliabbly estimate the quality level of a dataset?

Fixing (parts of) your Labeled Dataset

How much of a dataset needs to be fixed to reach a target quality level?

--- Incenda AI ---

Want to discuss Data Quality? At Incenda AI we obsess about it - reach out!

Thanks David Stengel and Max Ronecker for contributing.

Najib Hadir

Sr. Program Lead ADAS/AD @VWGoA IECC | Product Inception & Placement | North American Region

4 年

Very nice and comprehensive article. Thanks for sharing!!

1 次回应

Martin Grossman

Algorithms Team Lead

4 年

Great post! Its always cool to make insights graphicaly nice and clear. Felix Friedmann are you also evaluating tracker qualities in the data?

1 次回应

Steffen Heinrich

Co-Founder Peregrine.ai: AI-powered vision, for smarter cameras. Previously autonomous driving R&D and strategy.

4 年

Felix, du bist ja on LinkedIn content fire. Thanks for sharing. Top wie immer!

1 次回应

Jad Nohra

Solving and re-solving @Irreducible

4 年

Basics are always good to look at, often forgotten, in the noise of rushing legs.

2 次回应

查看更多评论

要查看或添加评论，请登录

Felix Friedmann的更多文章

Ep1: Antonio M. López on Early ADAS Development in Academia, SYNTHIA, CARLA, UrbanSyn, SensiMotor Models

2024年11月25日

Ep1: Antonio M. López on Early ADAS Development in Academia, SYNTHIA, CARLA, UrbanSyn, SensiMotor Models

Based on a discussion with Antonio M. López: Researcher & Professor at Computer Vision Center (CVC) of Universitat…

1 条评论
Sample Size Determination for Data Quality Checks

2020年11月12日

Sample Size Determination for Data Quality Checks

Intro you will probably skip After having recently discussed how to Fix (parts of) your Labeled Dataset, let's now look…

4 条评论
Fixing (parts of) your Labeled Dataset

2020年10月28日

Fixing (parts of) your Labeled Dataset

Intro that you'll probably skip Supervised learning, i.e.

6 条评论
Join Autonomous Driving Meetup #4

2017年12月8日

Join Autonomous Driving Meetup #4

There'll be talks on Automated Driving Architectures (DFKI+BeamNg) and DNN training with synthetic data (TU Graz) plus…

1 条评论
1st Autonomous Driving Meetup in Shanghai, tomorrow!

2017年11月6日

1st Autonomous Driving Meetup in Shanghai, tomorrow!

Join an open discussion on all self-driving car technology and feel free to forward this invitation!

1 条评论

See all articles

Precision, Recall, F1-Score for Object Detection - Back to the ML Basics

Felix Friedmann

NVIDIA DriveOS, embedded LLM/VLM, DriveWorks

Why Precision and Recall?

Definitions

A closer look at some scenarios

This is just the beginning

Sample Size Determination for Data Quality Checks

Fixing (parts of) your Labeled Dataset

Felix Friedmann的更多文章

社区洞察

其他会员也浏览了

How to Deal with Multicollinearity?

Elastic Net Regression: Combining Both Ridge & Lasso

Sliding Window Algorithmic Mental Model

The General Routing Problem

There Is No Algorithmic Component to the NOL Effect in Conjoint Analysis

How YOLOv8 Redefines Object Detection Capabilities

Why Mean Squared Error for Linear Regression?

YOLO-World: A Fresh Approach to Object Detection Integrating Image Features and Text Embeddings

Idea of Use and Abuse of Regression

Digging Deeper: How Combining 4M & SUMDEx is Revolutionizing Utility Mapping for the Future

Why Precision and Recall?

Definitions

A closer look at some scenarios

This is just the beginning

Sample Size Determination for Data Quality Checks

Fixing (parts of) your Labeled Dataset

Felix Friedmann的更多文章

Ep1: Antonio M. López on Early ADAS Development in Academia, SYNTHIA, CARLA, UrbanSyn, SensiMotor Models

Sample Size Determination for Data Quality Checks

Fixing (parts of) your Labeled Dataset

Join Autonomous Driving Meetup #4

1st Autonomous Driving Meetup in Shanghai, tomorrow!

社区洞察

其他会员也浏览了

How to Deal with Multicollinearity?

Elastic Net Regression: Combining Both Ridge & Lasso

Sliding Window Algorithmic Mental Model

The General Routing Problem

There Is No Algorithmic Component to the NOL Effect in Conjoint Analysis

How YOLOv8 Redefines Object Detection Capabilities

Why Mean Squared Error for Linear Regression?

YOLO-World: A Fresh Approach to Object Detection Integrating Image Features and Text Embeddings

Idea of Use and Abuse of Regression

Digging Deeper: How Combining 4M & SUMDEx is Revolutionizing Utility Mapping for the Future