Precision, Recall, F1-Score for Object Detection - Back to the ML Basics
There are some topics that we come across again and again. As Christoph Petzinger, a fellow (fantastic) software engineer, and me realized when we were back at camera interfacing for the n-th time, it may feel like walking a full circle, although every time you'll be faster, better prepared.
Another topic like this, as mundane as it may seem, are Precision and Recall. They may seem trivial, an oversimplification even, and yet they are the very foundation of assessing object detection performance. Let's have a closer look at them.
Why Precision and Recall?
There is a plethora of metrics of classification performance, besides Precision and Recall there are specificity, negative predictive value, fall-out, etc.; but many of those reflect True Negatives for which an infinite amount exists in a detection task, hence they are not applicable.
Definitions
- FP - False Positives: "Ghosts", number of issued predictions that were incorrect.
- FN - False Negatives: "Misses", number of targets that should have been predicted, but were missed.
- TP - True Positives: "Hits", correct predictions
- TN - True Negatives: correct prediction of the absence of a class; not useful for detections tasks as in this context, an infinite amount of TNs exists.
- Precision: ~"Prediction Reliability", share of the issued predictions that were correct.
- Recall: "Hit Rate", share of the targets that were predicted correctly.
- F1 Score: The harmonic mean between Precision and Recall, hence a metric reflecting both perspectives.
A closer look at some scenarios
The chart above shows Precision and Recall values for various scenarios to illustrate their respective characteristics.
- Scenario A is a "Mixed Bag": Since it contains TPs, FNs and FPs, both Precision and Recall are "somewhere between" 0 an 100% and the F1 score provides a value between them (note how it differs from arithmetic mean) - overall a standard case.
- Scenario B is an "Epic Win": There are only TPs, hence Precision and Recall are at 100%
- Scenario C is a "Catastrophic Failure": There are no TPs, hence Precision and Recall are zero.
- Scenario D is a "Ghost Town": There are TPs and FPs but no FNs, hence Precision is low while recall is high.
- Scenario E is "Prediction Scarcity": There are TPs and FNs but no FPs, hence Precision is high and Recall is low.
Scenario D and E nicely illustrate that
- Precision is a measure for efficiency - if few predictions are wasted, Precision is high, even if that means that only few targets are hit. It favors a targeted approach and may drive a model into frugality - better no predictions than a wrong ones.
- Recall is a measure for effectiveness - the more targets are hit, the better, independent of the amount of wrong predictions made. It favors a spamming approach and may drive a model into prodigality - issuing a flood of predictions in the hope that some may hit the targets.
- F1 Score reflects both Precision and Recall, making it a good aggregated indicator; but it abstracts away the source of the problem; for that we have to go back to Precision and Recall...
This is just the beginning
Admittedly, there is a degree of repetition in the above, but hopefully this clarifies the meaning of the two metrics once and for all; and demonstrates the value of the F1 score.
Precision and Recall are really the absolute basics; from this point there are many avenues to explore, the Confusion Matrix, Receiver Operating Characteristic (ROC) Curves, etc.
--- Previous Articles ---
Sample Size Determination for Data Quality Checks
How much data needs to be checked to reliabbly estimate the quality level of a dataset?
Fixing (parts of) your Labeled Dataset
How much of a dataset needs to be fixed to reach a target quality level?
--- Incenda AI ---
Want to discuss Data Quality? At Incenda AI we obsess about it - reach out!
Thanks David Stengel and Max Ronecker for contributing.
Sr. Program Lead ADAS/AD @VWGoA IECC | Product Inception & Placement | North American Region
4 年Very nice and comprehensive article. Thanks for sharing!!
Algorithms Team Lead
4 年Great post! Its always cool to make insights graphicaly nice and clear. Felix Friedmann are you also evaluating tracker qualities in the data?
Co-Founder Peregrine.ai: AI-powered vision, for smarter cameras. Previously autonomous driving R&D and strategy.
4 年Felix, du bist ja on LinkedIn content fire. Thanks for sharing. Top wie immer!
Solving and re-solving @Irreducible
4 年Basics are always good to look at, often forgotten, in the noise of rushing legs.