Choosing the best Detection Network for your production
Today, I would like to discuss which detection Network is the best for you.?
And, it’s not about names, more about the approach to choose that is the best for the production. Let's talk about how embarrassing it is to use MobileNet SSD today. Which of dozens of YOLOs to choose?
I never tried writing articles in LI. Let's test it. Also, you can check it on YouTube.
I have much experience with startups. And it seems that most young teams look at neural networks as follows:
They open a graph and pick a Network with better values. And, of course, it never works when the prototype hits the prod.
Talking about which network to choose for your production needs to start with a simple question. Who will be affected by it? Here's a small list of actors. Sure, startups may not have some of the roles. But it's important to realize that a network is not just about accuracy and speed. It is also the complexity of support and legal risks:
Accuracy
But let's start with accuracy.
Can I look at the official accuracy for network comparison to choose one??
- Yes. If you work with the COCO dataset or COCO classes.
- No. If you have your dataset. It's expected that the training will give different results on different datasets.
Network A works 100ms, and network B for 200ms on T4. Will x2 speed bust remain the same on different GPU boards?
- Usually not. Especially on the edge devices. Inference speed is a complex subject.
I have 1-3 months. I need to improve Accuracy/Speed performance. What do you think I should do?
Improving accuracy = Improving speed (you can choose a slower network with better accuracy). Look at the speed:
- Quantization
- Export
- Batching
- NMS on GPU
- Change image size
Look at the dataset and try to improve problematic cases:
- Clean dataset
- Fix corner cases with labeling / other algorithms.
- Augmentation
So. The accuracy difference is small for networks with similar speeds and similar years of release.
In my humble opinion, you need only choose your case:
- Super fast networks
- Regular networks
- Super accurate networks
And that the accuracy will be around the same. Here is a rough description of the groups:
Engineering
There is usually a balance: “The higher the quality of the code and the more features, the more difficult it will be to modify the Networkâ€. The picture I have in my head looks like this:
领英推è
But this image is from feelings. For example, PaddlePaddle may be a good framework, but I have never seen any experts in Europe or America.?In my opinion, here is a good example of this problem. Choose the easiest code to understand:
An essential criterion for choosing a network is exporting the network to different inference frameworks—the more complex the network - the more complex the export. The easiest is usually to export to Intel and Nvidia. And if you have a RockChip or microcontroller - everything is not so simple. Want to know more? Check my article.
Here is super simple comparison:
Let's get to more global issues. Look at the licenses. Is your project open source? Can you share the learning part? Do you use existing code when inferencing or write your own??
Discuss this with your product owner:
Also, it's important to discuss the data issue with your product owner. Do you have a lot of it? Do you detect any standard objects, or do you have unique frames? Do you have a lot of GPUs, do you need fast retraining?
Architecture
Is the list of questions above full?
Of course not. There are a lot of little ones. Both about support and training. A few sample questions I've listed here.?For example:
- TF detection API is already dead.
- YoloX, DAMO-YOLO, no support.
- For PaddlePaddle, it is hard to find a specialist
- Yolov5/Yolov8 - almost always, training will be fine
- Yolov4/Yolov7 - more accuracy can be achieved, but may not work out of the box
Survey
In my channel, I did a little survey about what people use in production.?
I normalized the answers by interesting classes.
Do you think that this format is ok? Don't be shy in the comments.
Computer Vision engineer at PTF-Lab
1 年Anton, it’s a great format for knowledge sharing! Thanks ???
Data scientist with 4+ years of professional experience | Looking for a job as a Senior CV/NLP Engineer with relocation or full remote worldwide
1 å¹´proDuction
Director, Multi-Domain Systems @ Boston Fusion | Recovering Rock Researcher | Buckling Buccaneer
1 å¹´Dig this format, Anton Maltsev !
Co-founder at Proxima Ultra | Proxima Vision Process Mining
1 å¹´Thanks, really helpful.Especially for us newcomers, and architecture planning. For some comparisons I just wish this article was published earlier.?