How to Improve Small Object Detection Accuracy Without Increasing Latency

How to Improve Small Object Detection Accuracy Without Increasing Latency

Small object detection, which is different from regular object detection, is the task of identifying and locating small objects in images or videos. While it is challenging, there are different ways to improve the accuracy of small detections without increasing latency – from modifying your evaluation metrics to tailoring your architecture for the given task.

Why is small object detection challenging?

Objects within digital images that are small relative to the overall image size, and typically occupy a minimal number of pixels, make them difficult to detect using modern convolutional neural networks or even transformer models, since they are not designed specifically for this kind of task.

While small object detection is now being in a number of use cases including crop health monitoring, wildlife monitoring, damage assessment, and infrastructure mapping, a few challenges need to be addressed:

  • Limited detail
  • Sensitivity of the IoU metric
  • Suboptimality of the COCO mAP metric

Ways to improve small object detection accuracy without increasing latency

Use higher resolution images

While it seems like a straightforward solution, this is often impractical due to resource limitations, increased memory and computational requirements, and potential latency issues. The challenges with feature extraction and scale mismatch may also persist.

Use optimal evaluation metrics

Here are two alternatives to [email protected], which are more appropriate for small object detection and won’t punish the model for a single pixel misalignment:

  • Distance-based metrics (DetectionMetricsDistanceBased)

Tailor the architecture to small object detection

But the real boost in accuracy comes from tailoring your architecture to small object detection. This offers several promising modifications:

  • Optimizing the receptive field for small objects
  • Using less coarse feature maps (though this increases latency)
  • Tailored detection heads

While adopting this approach can result in significant improvement in accuracy without added computational cost, it does require both a high level of expertise and, if done manually, a lot of trial and error to find an optimal combination of model components.

YOLO-NAS-Sat in action

Deci’s AutoNAC and the new frontier for small object detection model, YOLO-NAS-Sat

The AutoNAC engine includes a set of algorithms that can predict the accuracy of a neural network model without having to train it, enabling a very fast and powerful search strategy.

In a nutshell, it takes the input (the task, what you want to achieve, data characteristics, and target hardware). Then, the engine runs its algorithmic search and comes up with a new architecture that feeds the given need and secures the requested accuracy.

One of the models AutoNAC generated is YOLO-NAS, which is renowned for its robust performance in standard object detection tasks. This is where YOLO-NAS-Sat, a new model specifically for small object detection, is based on. The macro-level architecture remains consistent with YOLO-NAS, but a few strategic modifications to better address small object detection challenges were made:

  • Backbone modifications. The number of layers in the backbone has been adjusted to optimize the processing of small objects, enhancing the model’s ability to discern minute details.

  • Revamped neck design. A newly designed neck, inspired by the U-Net-style decoder, focuses on retaining more small-level details. This adaptation is crucial for preserving fine feature maps that are vital for detecting small objects.

  • Context module adjustment. The original “context” module in YOLO-NAS, intended to capture global context, has been replaced. We discovered that for tasks like processing large satellite images, a local receptive window is more beneficial, improving both accuracy and network latency.

These architectural innovations ensure that YOLO-NAS-Sat is uniquely equipped to handle the intricacies of small object detection, offering an unparalleled accuracy-speed trade-off.

YOLO-NAS-Sat sets itself apart by delivering an exceptional accuracy-latency trade-off, outperforming established models like YOLOv8 in small object detection. For instance, when evaluated on the DOTA 2.0 dataset, YOLO-NAS-Sat L achieves a 2.02x lower latency and a 6.99 higher mAP on the NVIDIA Jetson AGX ORIN with FP16 precision over YOLOV8.

To learn more about YOLO-NAS-Sat, read the blog .

If you’re curious about how you can leverage AutoNAC for your computer vision and generative AI projects, talk with our experts .


Get ahead with the latest AI content

  • Databricks releases a family of open-source LLM, DBRX . The provider of data lakehouse claims that DBRX surpasses OpenAI's GPT 3.5 as well as several open source models like Mixtral, Claude 3, Llama 2, and Grok-1 in typical benchmark assessments. DBRX is available for free download on GitHub and Hugging Face for both research and commercial purposes.

  • GitHub debuts AI-powered autofix tool . Merging the real-time capabilities of GitHub’s Copilot with CodeQL, it detects and fixes security vulnerabilities during the coding process.

  • 英伟达 shares a couple of robotics announcements at GTC. First, Project GR00T, a foundation model for humanoid robots. Second, a pledge to become the inaugural platinum member of the Open Source Robotics Alliance, demonstrating its commitment to and support of open source robotics.

  • 苹果 details its efforts in training MLLMs . The paper highlights the importance of various architectural elements and data selection. It finds that a well-rounded combination of image-caption, interleaved image-text, and text-only data is crucial for attaining SOTA outcomes in large multimodal pre-training.


Save the date

[Live Webinar] How to Evaluate LLMs: Benchmarks, Vibe Checks, Judges, and Beyond | March 14

?? Learn how to use the 高通 Snapdragon NPE to implement advanced INT8 quantization techniques, enabling faster model performance without losing accuracy. Save your spot!


Quick Deci updates

We released our Generative AI Platform, featuring a new series of LLMs, an inference engine, and an AI inference cluster management solution. Deci-Nano is the first model in the series to be released. It is an LLM that offers an exceptional balance of quality, speed, and affordability


Enjoyed these deep learning tips? Help us make our newsletter bigger and better by sharing it with your colleagues and friends!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了