登录查看更多内容

Two Issues About Object Detection Accuracy

Daniel Morton PhD

Senior Data Scientist | Builder | MS Analytics | PhD Mathematics | Machine Learning | Data Science | Deep Learning | Ad Tech

发布日期: 2024年11月12日

Object detection answers two questions simultaneously. What is it? And where is it? For the most part, computer vision models answer these questions independently. A typical CNN detection model will predict the likelihood that there is an object and the dimensions of its bounding box independently of the prediction of what the object is. Mostly independently. Both predictions use the same image and, for the most part, the outputs of the same convolutions. But they make up separate parts of the loss function and neither the bounding box coordinates nor the object classification is an input for predicting the other. The extent that classification and localization are correlated is merely that they are derived from the same input.

In practice this means that most of the difficulty in object detection models is on the classification side. Determining bounding box dimensions is the easy part. To demonstrate we can run the same object detection model twice, once treating all classes separately and once treating all objects as the same class.

I'll use the Larch Case-bearer dataset I've been working with for a while. As a reminder, this is a collection of drone images of Swedish Larch forests, many of which are unwilling hosts to a type of case-bearer moth larva. There are four classes, healthy larch trees, lightly damaged larch trees, heavily damaged larch trees, and some other tree species. Most of the trees are lightly damaged larch trees.

To emphasize how good object detection is when object class is irrelevant I'll use the smallest of the YOLO11 models, YOLO11n. I keep the default image size of 1500x1500 which, even with this small model, requires a batch size of 8. Augmentation consists of horizontal and vertical flips and the few default Albumentations that I can't turn off. (None of which, I think, help accuracy but the model still does well enough.) Mixup, where two images are merged together is set with probability 0.3. Running the relevant notebook in Kaggle took about an hour and a half. Train/Val/Test is 70/15/15 stratified across the different locations in Sweden.

The result on the holdout test set is mAP@50 96.3 and mAP50-95 64.6 which is probably about as close to perfect as I could reasonably expect, especially with a dataset with an average of 68 objects per image.

A typical scene with ground truth boxes is below.

And these are the model predictions.

The detections look very similar. The model output may even be an improvement. The annotator(s) (about whom I know nothing) regularly missed trees on the edge of the image and missed the occasional small tree in the interior of the image. Of course all these trees missed by the annotator and caught by the model count against mAP. A reminder, if you need it, that model accuracy metrics are guidelines, not gospel.

A couple more images illustrate the same point. Note the tree caught by the model on the other side of the road as well as several trees missed by the annotator on the bottom of the scene.

领英推荐

Reasoning on Graphs – Part II – Comparison and mapping…

Fabio Ricci 2 年前

Locating ROI in Iris Using Randomized Hough Transform

Delphic 1 年前

Correlation, causation and vector autoregressions

Andrey Chirikhin 1 年前

Both of above images have been a mix of healthy and lightly damaged trees. If we include heavily damaged and other species the result is the same. Notice that, once again, the model picks out some trees (not larch, something else) that the annotator missed.

If anything mAP 64.6 is probably an understatement.

Now what happens if we train the same model, YOLO11n, on the same dataset but keep the class labels.

The dominant Low Damage class has mAP numbers that are only slightly lower than the one-class model. Precision drops for the other three classes although mostly remains in respectable territory. The only real weak spot is the Healthy category, many of whom are inaccurately labeled as Low Damage. Since this is, by far, the smallest category this is to be expected.

As with the single class case it may be possible that the metrics aren't telling the whole story. Compare the "ground truth" to the predicted output here. Blue is healthy and cyan is low damage. (Not my choice, YOLO defaults.)

I'm no expert on larch trees or tree diseases but it is obvious that, as the larva numbers increase more and more needles go brown. Some of the trees labeled low damage, especially those at the top of the image look perfectly healthy to me. They look healthy to the model as well. This good be another case of the model improving on the "ground truth" accuracy. Even in Sweden I expect this sort of annotation work is underfunded; the ground truth could be an overworked grad student's best guess. It seems possible that the mAP score undersells the model performance.

要查看或添加评论，请登录

Daniel Morton PhD的更多文章

Where Does Logistic Regression Come From?

2024年11月26日

Where Does Logistic Regression Come From?

The question is really: why does logistic regression take the form that it does? Why is the link function, for that is…
The Derivative of sin(x)

2024年11月26日

The Derivative of sin(x)

How do you derive the derivative of sin(x). Most of the answers you're likely to come up with (i.
Mislabeled Data - Not as Bad as You'd Think

2024年11月18日

Mislabeled Data - Not as Bad as You'd Think

Suppose I gave you a nice set of training data. Twenty features.
Claude and the TAs Nightmare

2024年11月14日

Claude and the TAs Nightmare

Back in my teaching assistant days there were two types of homework I liked to grade. There was the rare student who…
Claude fails Sideways Arithmetic

2024年11月13日

Claude fails Sideways Arithmetic

Sideways Arithmetic from Wayside School. I worked through that book when I was ten.
ClaudeAI and ChatGPT try a Brain Tickler

2024年10月25日

ClaudeAI and ChatGPT try a Brain Tickler

Today's NYT Brain Ticker. Add two W's to each word and anagram the result to get a new word: 1.
How to Lie About Model Accuracy

2024年10月24日

How to Lie About Model Accuracy

I'm still looking at the Larch Casebearer Data. At this point I've produced four models that are at least close to best…
Another One Bytes the Dust

2024年10月22日

Another One Bytes the Dust

At this point there's not even much to say. I think we all know that Claude, and to a lesser extent ChatGPT, can do…
Into the Forest I Go (Again) - Part 1

2024年10月18日

Into the Forest I Go (Again) - Part 1

This is an update of something I worked on a few years back. At the time Colab's pricing was still reasonable and…
Claude Goes to Stanford

2024年10月9日

Claude Goes to Stanford

I assigned ClaudeAI to take Stanford CS221 - Artificial Intelligence: Principles and Techniques. It passed with…

See all articles

Two Issues About Object Detection Accuracy

Daniel Morton PhD

Senior Data Scientist | Builder | MS Analytics | PhD Mathematics | Machine Learning | Data Science | Deep Learning | Ad Tech

领英推荐

Daniel Morton PhD的更多文章

社区洞察

其他会员也浏览了

Correlation, causation and vector autoregressions

Meta-Analysis in R part 2. Adding Fixed, Random effect tests and Prediction intervals using 'meta' library

Laplacian Eigenmaps

YOLOv8.1, What's New?

Accelerating Machine Learning Applications for Road Surfaces with Seasonal Image Datasets

Single-Source Shortest Path Problem

Quick Understanding: Instance segmentation vs. Semantic segmentation in Image Analysis

ARIMA Model

A Classic DP Problem: Longest Palindromic Substring

Graph Algorithm - Cycle Detection in Directed Graph using DFS

领英推荐

Daniel Morton PhD的更多文章

Where Does Logistic Regression Come From?

The Derivative of sin(x)

Mislabeled Data - Not as Bad as You'd Think

Claude and the TAs Nightmare

Claude fails Sideways Arithmetic

ClaudeAI and ChatGPT try a Brain Tickler

How to Lie About Model Accuracy

Another One Bytes the Dust

Into the Forest I Go (Again) - Part 1

Claude Goes to Stanford

社区洞察

其他会员也浏览了

Correlation, causation and vector autoregressions

Meta-Analysis in R part 2. Adding Fixed, Random effect tests and Prediction intervals using 'meta' library

Laplacian Eigenmaps

YOLOv8.1, What's New?

Accelerating Machine Learning Applications for Road Surfaces with Seasonal Image Datasets

Single-Source Shortest Path Problem

Quick Understanding: Instance segmentation vs. Semantic segmentation in Image Analysis

ARIMA Model

A Classic DP Problem: Longest Palindromic Substring

Graph Algorithm - Cycle Detection in Directed Graph using DFS