Two Issues About Object Detection Accuracy

Two Issues About Object Detection Accuracy

Object detection answers two questions simultaneously. What is it? And where is it? For the most part, computer vision models answer these questions independently. A typical CNN detection model will predict the likelihood that there is an object and the dimensions of its bounding box independently of the prediction of what the object is. Mostly independently. Both predictions use the same image and, for the most part, the outputs of the same convolutions. But they make up separate parts of the loss function and neither the bounding box coordinates nor the object classification is an input for predicting the other. The extent that classification and localization are correlated is merely that they are derived from the same input.

In practice this means that most of the difficulty in object detection models is on the classification side. Determining bounding box dimensions is the easy part. To demonstrate we can run the same object detection model twice, once treating all classes separately and once treating all objects as the same class.

I'll use the Larch Case-bearer dataset I've been working with for a while. As a reminder, this is a collection of drone images of Swedish Larch forests, many of which are unwilling hosts to a type of case-bearer moth larva. There are four classes, healthy larch trees, lightly damaged larch trees, heavily damaged larch trees, and some other tree species. Most of the trees are lightly damaged larch trees.

To emphasize how good object detection is when object class is irrelevant I'll use the smallest of the YOLO11 models, YOLO11n. I keep the default image size of 1500x1500 which, even with this small model, requires a batch size of 8. Augmentation consists of horizontal and vertical flips and the few default Albumentations that I can't turn off. (None of which, I think, help accuracy but the model still does well enough.) Mixup, where two images are merged together is set with probability 0.3. Running the relevant notebook in Kaggle took about an hour and a half. Train/Val/Test is 70/15/15 stratified across the different locations in Sweden.

The result on the holdout test set is mAP@50 96.3 and mAP50-95 64.6 which is probably about as close to perfect as I could reasonably expect, especially with a dataset with an average of 68 objects per image.

A typical scene with ground truth boxes is below.


Ground Truth

And these are the model predictions.


Prediction

The detections look very similar. The model output may even be an improvement. The annotator(s) (about whom I know nothing) regularly missed trees on the edge of the image and missed the occasional small tree in the interior of the image. Of course all these trees missed by the annotator and caught by the model count against mAP. A reminder, if you need it, that model accuracy metrics are guidelines, not gospel.

A couple more images illustrate the same point. Note the tree caught by the model on the other side of the road as well as several trees missed by the annotator on the bottom of the scene.


Ground Truth


Model Prediction

Both of above images have been a mix of healthy and lightly damaged trees. If we include heavily damaged and other species the result is the same. Notice that, once again, the model picks out some trees (not larch, something else) that the annotator missed.


Ground Truth


Prediction

If anything mAP 64.6 is probably an understatement.

Now what happens if we train the same model, YOLO11n, on the same dataset but keep the class labels.


The dominant Low Damage class has mAP numbers that are only slightly lower than the one-class model. Precision drops for the other three classes although mostly remains in respectable territory. The only real weak spot is the Healthy category, many of whom are inaccurately labeled as Low Damage. Since this is, by far, the smallest category this is to be expected.

As with the single class case it may be possible that the metrics aren't telling the whole story. Compare the "ground truth" to the predicted output here. Blue is healthy and cyan is low damage. (Not my choice, YOLO defaults.)


Ground Truth


Predicted

I'm no expert on larch trees or tree diseases but it is obvious that, as the larva numbers increase more and more needles go brown. Some of the trees labeled low damage, especially those at the top of the image look perfectly healthy to me. They look healthy to the model as well. This good be another case of the model improving on the "ground truth" accuracy. Even in Sweden I expect this sort of annotation work is underfunded; the ground truth could be an overworked grad student's best guess. It seems possible that the mAP score undersells the model performance.

要查看或添加评论,请登录

Daniel Morton PhD的更多文章

  • Where Does Logistic Regression Come From?

    Where Does Logistic Regression Come From?

    The question is really: why does logistic regression take the form that it does? Why is the link function, for that is…

  • The Derivative of sin(x)

    The Derivative of sin(x)

    How do you derive the derivative of sin(x). Most of the answers you're likely to come up with (i.

  • Mislabeled Data - Not as Bad as You'd Think

    Mislabeled Data - Not as Bad as You'd Think

    Suppose I gave you a nice set of training data. Twenty features.

  • Claude and the TAs Nightmare

    Claude and the TAs Nightmare

    Back in my teaching assistant days there were two types of homework I liked to grade. There was the rare student who…

  • Claude fails Sideways Arithmetic

    Claude fails Sideways Arithmetic

    Sideways Arithmetic from Wayside School. I worked through that book when I was ten.

  • ClaudeAI and ChatGPT try a Brain Tickler

    ClaudeAI and ChatGPT try a Brain Tickler

    Today's NYT Brain Ticker. Add two W's to each word and anagram the result to get a new word: 1.

  • How to Lie About Model Accuracy

    How to Lie About Model Accuracy

    I'm still looking at the Larch Casebearer Data. At this point I've produced four models that are at least close to best…

  • Another One Bytes the Dust

    Another One Bytes the Dust

    At this point there's not even much to say. I think we all know that Claude, and to a lesser extent ChatGPT, can do…

  • Into the Forest I Go (Again) - Part 1

    Into the Forest I Go (Again) - Part 1

    This is an update of something I worked on a few years back. At the time Colab's pricing was still reasonable and…

  • Claude Goes to Stanford

    Claude Goes to Stanford

    I assigned ClaudeAI to take Stanford CS221 - Artificial Intelligence: Principles and Techniques. It passed with…

社区洞察

其他会员也浏览了