YOLO-NAS: 7 Factors to Success
Deci AI (Acquired by NVIDIA)
Deci enables deep learning to live up to its true potential by using AI to build better AI.
YOLO-NAS is a foundation object detection model that provides superior real-time object detection capabilities and production-ready performance. It delivers state-of-the-art performance with unparalleled accuracy-speed performance, outperforming other models such as YOLOv5, YOLOv6, YOLOv7, and YOLOv8.
Deci’s proprietary Neural Architecture Search (NAS) technology, AutoNAC?, generated the YOLO-NAS model. The AutoNAC engine finds the optimal architecture with the best balance between accuracy and inference speed for specific data characteristics, task, performance targets, and inference environment. In addition to being data and hardware aware, it considers other components in the inference stack, including compilers and quantization.
The use of NAS is one of the secret sauces to the success of YOLO-NAS, as none of the previous YOLO family of models were generated by it. But, what are the other factors that make YOLO-NAS the fastest object detection model to date? Here are seven more:
1. The YOLO-NAS architecture space was inspired by the recent YOLOv6 and YOLOv8.
The process of developing YOLO-NAS included the process of taking all the best-of-breed, common knowledge on how to build YOLO models and putting them into the architecture search space. With all the possibilities to develop the models, it was just a matter of choosing the right one.
2. A new quantization-friendly basic block that generalized previous offerings.?
YOLO-NAS features blocks that are quantization-friendly, so they can go to 8-bit easily without losing a lot of accuracy.
3. AutoNAC was applied on the ~10^14 sized architecture space to obtain 3 final networks (S, M, L).
The AutoNAC engine was applied on a search space of ~10^14. In general, it’s challenging to run a full loop on that side, because it’s huge. To work with that search space, it has to be sliced into several buckets and down-sample it. Finally, the process resulted in three networks, which are YOLO-NAS small, medium, and large.
4. The AutoNAC stage took an equivalent GPU time of fully training 5 networks.
The AutoNAC stage for this took about a time of training five networks. So, it's not that huge of compute power, particularly, when compared with previous NAS algorithms like what Google used in NASNet or EfficientNet.
5. YOLO-NAS was trained with SuperGradients, Deci’s open source library.
YOLO-NAS was trained with SuperGradients, an open source library that is available on GitHub that you can use to train various object detection models like YOLOX, YOLO-NAS, and many other algorithms, including PP-YOLO, and more.
6. There’s a new and advanced training scheme.
Because it’s a foundation model, the creators of YOLO-NAS didn’t limit the training to Pascal, COCO, or any of those data sets. To enrich the data set with more data and advanced training techniques, Object 365 was used as an additional data set for training. Knowledge distillation from a pre-trained teacher model was incorporated. YOLO-NAS was also trained using pseudo-labeled data and some other techniques to improve the training scheme of the model.
7. Quantization-aware building blocks.
After the training, post-training quantization (PTQ) was applied and the network was converted into INT8 with minimal loss of accuracy.
The result of all of these factors? As demonstrated in the chart below, the YOLO-NAS (m) model delivers a 50% (x1.5) increase in throughput and 1 mAP better accuracy compared to other SOTA YOLO models on the NVIDIA T4 GPU.
领英推荐
Interested in learning more about YOLO-NAS? Watch this video for a more in-depth discussion on YOLO-NAS and how to get started.
Get ahead with the latest deep learning content
Save the date: Learn more about DL inference on edge devices and open source LLM ops!
[Live Webinar] How to Accelerate DL Inference on Edge Devices | August 30th
Learn how to optimize your deep learning models for maximum speed and efficiency. From quantization to multi-stream inference, Ran Zilberstein, VP of Engineering at Deci, will share practical tips and best practices to help you leverage the full potential of your edge devices. Save your spot!
[Live Webinar] RAG with Llama 2 and LangChain: Building with Open-Source LLM Ops?| August 31st
Discover how to combine Llama 2 with LangChain to build a Retrieval Augmented Generation (RAG) or Retrieval Augmented Question Answering (RAQA) system to analyze the most popular internet phenomenon on the planet: Barbie + Oppenheimer = Barbenheimer. Save your spot!
Enjoyed these deep learning tips? Help us make our newsletter bigger and better by sharing it with your colleagues and friends!