Into the Forest I Go (Again) - Part 1
Daniel Morton PhD
Senior Data Scientist | Builder | MS Analytics | PhD Mathematics | Machine Learning | Data Science | Deep Learning | Ad Tech
This is an update of something I worked on a few years back. At the time Colab's pricing was still reasonable and EfficientDet object detection models were still hot. This time around I used the (almost) current YOLOv10 model. (YOLO11 just came out but looks like it should have been called YOLO10.1) and ClaudeAI to format the input files appropriately. Since Colab is now about as reliable as one of the late Tom Magliozzi's cars I've switched to Kaggle Notebooks.
The problem is simple enough. Sweden has extensive larch forests in the province of V?sterg?tland. The larch trees themselves were introduced from the Alps nearly 300 years ago, but have now started to fall victim to an invasive pest of their own, a small moth whose larval stage likes to bore into their needles. After a few years of this indelicate treatment the tree's growth is stunted and it becomes vulnerable to other, even less pleasant, infections.
The above should be enough to convince you that detecting infection is of more than academic interest. Like most insect pests, the larva is invisible but its effects are not. The data consists of aerial, presumably drone, photos of several sections of forest. Trees are annotated as health (H), low damage (LD), high damage(HD), or other (O) if it's different species. From a human perspective the problem reduces to detecting changes in color. Healthy trees are a light green. Trees with low damages have some brown but usually still have green in the central crown. Heavily damaged trees are almost all brown. Other species are usually a deeper green. Each image has a lot, about 65 on average and often over a hundred, trees. The images are good sized too, 1500x1500. About 60% of the trees are low damage larch, 16% are badly damaged larch, 20% something else, and just under 5% are healthy larch trees.
A fuller description of the problem is in the original writeup.
Along with the images there are XML files that provide the annotations in a reasonably good attempt at PASCAL VOC format. Although not uncommon, this is useless for any object detection model. Given the way the data was structured, one folder for each location, services like Roboflow for file conversion weren't entirely practical. Nor were they necessary when I could just tell Claude to convert VOC to COCO and then COCO to YOLO, and have it do the train/val/test split at the same time. For those interested, Claude's work can be found here.
With those preliminaries out of the way (and isn't it nice that Claude can do the parsing grunt work now) I could start training the model. YOLOv10 comes in 6 sizes, although once you get past 10b (for balanced) the performance gains may not be worth the extra time and memory. I've focused on the smallest model, 10n (for nano), since it's fast enough for quick experiments. When I did go from 10n to 10b the improvement was modest considering the 10-fold increase in parameters (+2 MAP@50 and +2.3 MAP50-95 after 100 epoch with identical training parameters.)
What did I use as training parameters? I stuck with the default learning rate and learning rate scheduler. All else being equal I assumed Ultralytics knew what they were doing. There are some default Albumentations they don't let me turn off that are probably not helpful for this problem (Blur and ToGray) but don't seem to do any damage either. I didn't mess with the image default, although since 640x640 is a quarter of the size of 1500x1500 it might be worth trying to cut the original images into four. I left the mosaic and, somewhat reluctantly, erasing setting at default. I've never believed in color augmentations, except adjusting brightness, especially when the problem really reduces to detecting color changes. All the hue, saturation, and brightness randomizers get set to 0. Flips, since there's nothing special about the image orientation, scaling, and the default translations stay. I eventually added mixup (i.e. averaging two images together) with probability 0.3 (based off a paper I found) which did improve overall MAP.
领英推荐
That's enough ramble. What about results? First off, I should say something about the bounding boxes. Accuracy for the bounding boxes themselves is fantastic. Running a model with just a single class (i.e. just a tree detector) got me MAP@50 of 94.5 and MAP50-95 of 62.6. Almost all the error is in classification.
Most of the metrics track instance frequency. Given their unfortunate rarity, healthy trees have the lowest MAP. The majority low damage class performs about as well as the generic tree detector and high damage also performs well.
To improve the accuracy of the less frequent classes I tried adjusting the focal loss weight but the default 1.5 value worked best. Ultralytics does seem to know what they're doing. Add mixup did improve accuracy; MAP for healthy trees had been 64.2 without it.
I am actually happy with this result. Despite being just over a third the size of EfficientDetD1, YOLOv10n has slightly outperformed the D1 models I built three years ago. And it took a lot less work.
Come back later for part 2, where I provide the results of YOLOv10b and a deeper dive into what I think the models are, and are not, doing well.