Choosing the best Detection Network for your production

Choosing the best Detection Network for your production

Today, I would like to discuss which detection Network is the best for you.?

And, it’s not about names, more about the approach to choose that is the best for the production. Let's talk about how embarrassing it is to use MobileNet SSD today. Which of dozens of YOLOs to choose?

I never tried writing articles in LI. Let's test it. Also, you can check it on YouTube.

I have much experience with startups. And it seems that most young teams look at neural networks as follows:

A few samples of comparisons

They open a graph and pick a Network with better values. And, of course, it never works when the prototype hits the prod.

Talking about which network to choose for your production needs to start with a simple question. Who will be affected by it? Here's a small list of actors. Sure, startups may not have some of the roles. But it's important to realize that a network is not just about accuracy and speed. It is also the complexity of support and legal risks:

Accuracy

But let's start with accuracy.

Can I look at the official accuracy for network comparison to choose one??

  • Yes. If you work with the COCO dataset or COCO classes.
  • No. If you have your dataset. It's expected that the training will give different results on different datasets.

Network A works 100ms, and network B for 200ms on T4. Will x2 speed bust remain the same on different GPU boards?

  • Usually not. Especially on the edge devices. Inference speed is a complex subject.

I have 1-3 months. I need to improve Accuracy/Speed performance. What do you think I should do?

Improving accuracy = Improving speed (you can choose a slower network with better accuracy). Look at the speed:

  • Quantization
  • Export
  • Batching
  • NMS on GPU
  • Change image size

Look at the dataset and try to improve problematic cases:

  • Clean dataset
  • Fix corner cases with labeling / other algorithms.
  • Augmentation


So. The accuracy difference is small for networks with similar speeds and similar years of release.

In my humble opinion, you need only choose your case:

  • Super fast networks
  • Regular networks
  • Super accurate networks

And that the accuracy will be around the same. Here is a rough description of the groups:

Super fast - Regular - Accurate

Engineering

There is usually a balance: “The higher the quality of the code and the more features, the more difficult it will be to modify the Network”. The picture I have in my head looks like this:

But this image is from feelings. For example, PaddlePaddle may be a good framework, but I have never seen any experts in Europe or America.?In my opinion, here is a good example of this problem. Choose the easiest code to understand:

YoloX
Yolov8


An essential criterion for choosing a network is exporting the network to different inference frameworks—the more complex the network - the more complex the export. The easiest is usually to export to Intel and Nvidia. And if you have a RockChip or microcontroller - everything is not so simple. Want to know more? Check my article.

Here is super simple comparison:

Let's get to more global issues. Look at the licenses. Is your project open source? Can you share the learning part? Do you use existing code when inferencing or write your own??

Discuss this with your product owner:

Also, it's important to discuss the data issue with your product owner. Do you have a lot of it? Do you detect any standard objects, or do you have unique frames? Do you have a lot of GPUs, do you need fast retraining?

Architecture

Is the list of questions above full?

Of course not. There are a lot of little ones. Both about support and training. A few sample questions I've listed here.?For example:

  • TF detection API is already dead.
  • YoloX, DAMO-YOLO, no support.
  • For PaddlePaddle, it is hard to find a specialist
  • Yolov5/Yolov8 - almost always, training will be fine
  • Yolov4/Yolov7 - more accuracy can be achieved, but may not work out of the box

Survey

In my channel, I did a little survey about what people use in production.?

I normalized the answers by interesting classes.

I wonder if all companies understand what is GPL-3 means...
It's okay to use MobileNetSSD!
Transformers are usually for Server-Based solutions with high accuracy.
I think this situation will change shortly.


Do you think that this format is ok? Don't be shy in the comments.

Artem Sivtsov

Computer Vision engineer at PTF-Lab

1 å¹´

Anton, it’s a great format for knowledge sharing! Thanks ???

赞
回复
Alexey Klokov

Data scientist with 4+ years of professional experience | Looking for a job as a Senior CV/NLP Engineer with relocation or full remote worldwide

1 å¹´

proDuction

Jay Miller

Director, Multi-Domain Systems @ Boston Fusion | Recovering Rock Researcher | Buckling Buccaneer

1 å¹´

Dig this format, Anton Maltsev !

Ivan Matveev

Co-founder at Proxima Ultra | Proxima Vision Process Mining

1 å¹´

Thanks, really helpful.Especially for us newcomers, and architecture planning. For some comparisons I just wish this article was published earlier.?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了