Using and Testing an Animal Classifier

Using and Testing an Animal Classifier

In my previous post, I explored the process of using animal classifiers, focusing on pipelines that detect animals, crop their images, and then classify them. Today, I’ll share results from experimenting with a European animal classifier model, and some insights on its performance.


A Quick Foreword

"The model is only as good as your data."

Models perform best when the input data closely resembles what they were trained on. This means your results may differ from mine, depending on your dataset.


The Test Setup

For this demo, I used EcoAssist, an open-source software that I’ve highlighted in previous posts. It’s one of the best tools available, offering nine animal models from different regions.

Here’s the setup I used:

  • Detection Model: MegaDetector 5a
  • Classification Model: European – DeepFaune v1.2
  • Confidence Thresholds: Default settings (as shown in the image below)



Dataset Overview

I tested the classifier on 430 images from Denmark, a challenging dataset with quite dark images and limited animal diversity. The dataset included:

  • Foxes
  • Mustelids
  • Raccoon dogs (not part of the classifier)
  • Birds
  • Roe deer


Results

  1. Empty Images Filtered by MegaDetector Out of 430 images, 124 were labeled as empty (animal detection below the 0.35 confidence threshold).


  1. Classification Results for 323 Animal Images Here’s the distribution of the detected animals:


Some example output:

Summary and Reflections

Nearly all foxes, mustelids, and birds in the dataset were correctly labeled, with only a few instances of misclassification. However, there were no badgers, cats, cows, or raccoons in the dataset, which highlights an important observation: misclassifications often occurred for animals not present in the dataset. For example, the classifier identified several "badgers," but this was likely due to the visual similarity between badgers and raccoon dogs, which share similar shapes and sizes.

On this somewhat challenging dataset, the classifier achieved approximately 70% accuracy. When working with video data, however, I noticed a more dynamic pattern: classification results varied from frame to frame. While this could be seen as a limitation, it also opens up an opportunity. By implementing logic to analyze dominant classifications across multiple frames (e.g., majority voting), it’s possible to improve the overall accuracy. This approach could be especially useful in scenarios where multiple sequential frames are available, such as footage from wildlife trail cameras.

This experience underscores the importance of context when interpreting results:

  1. Existing classes performed well: For known categories like foxes and mustelids, the classifier achieved high accuracy, even under suboptimal lighting.
  2. Unknown classes posed challenges: For animals like raccoon dogs that weren’t part of the classifier’s training data, misclassification rates were naturally higher.
  3. Dataset similarity matters: Since the training data for the model may differ significantly from mine, results could improve or worsen depending on how similar the test images are to the model’s training dataset.

Despite these challenges, the classifier delivered results quickly and autonomously, without human intervention at any stage (except me clicking some buttons). This rapid, automated approach could be invaluable for projects where immediate or large-scale classification is needed, even if perfection isn’t guaranteed. For example, the results might be sufficient for gaining a general understanding of animal distributions or for flagging specific images for further review.

Looking forward, there are opportunities to further refine such tools. For instance, incorporating feedback loops or additional layers of logic (e.g., to reconcile frame-level inconsistencies in video data) could significantly enhance performance. Additionally, retraining or fine-tuning the model with localized datasets could help bridge the gap between classifier expectations and real-world conditions.

In conclusion, while the results may not be perfect for all applications, the current level of performance shows the potential of these tools to support wildlife monitoring at scale. The true value lies in how we interpret and adapt these results to fit specific project needs.

要查看或添加评论,请登录

Hugo Markoff的更多文章

社区洞察

其他会员也浏览了