登录查看更多内容

YOLO-NAS: 7 Factors to Success

Deci AI (Acquired by NVIDIA)

Deci enables deep learning to live up to its true potential by using AI to build better AI.

发布日期: 2023年8月29日

YOLO-NAS is a foundation object detection model that provides superior real-time object detection capabilities and production-ready performance. It delivers state-of-the-art performance with unparalleled accuracy-speed performance, outperforming other models such as YOLOv5, YOLOv6, YOLOv7, and YOLOv8.

Deci’s proprietary Neural Architecture Search (NAS) technology, AutoNAC?, generated the YOLO-NAS model. The AutoNAC engine finds the optimal architecture with the best balance between accuracy and inference speed for specific data characteristics, task, performance targets, and inference environment. In addition to being data and hardware aware, it considers other components in the inference stack, including compilers and quantization.

The use of NAS is one of the secret sauces to the success of YOLO-NAS, as none of the previous YOLO family of models were generated by it. But, what are the other factors that make YOLO-NAS the fastest object detection model to date? Here are seven more:

1. The YOLO-NAS architecture space was inspired by the recent YOLOv6 and YOLOv8.

The process of developing YOLO-NAS included the process of taking all the best-of-breed, common knowledge on how to build YOLO models and putting them into the architecture search space. With all the possibilities to develop the models, it was just a matter of choosing the right one.

2. A new quantization-friendly basic block that generalized previous offerings.?

YOLO-NAS features blocks that are quantization-friendly, so they can go to 8-bit easily without losing a lot of accuracy.

3. AutoNAC was applied on the ~10^14 sized architecture space to obtain 3 final networks (S, M, L).

The AutoNAC engine was applied on a search space of ~10^14. In general, it’s challenging to run a full loop on that side, because it’s huge. To work with that search space, it has to be sliced into several buckets and down-sample it. Finally, the process resulted in three networks, which are YOLO-NAS small, medium, and large.

4. The AutoNAC stage took an equivalent GPU time of fully training 5 networks.

The AutoNAC stage for this took about a time of training five networks. So, it's not that huge of compute power, particularly, when compared with previous NAS algorithms like what Google used in NASNet or EfficientNet.

5. YOLO-NAS was trained with SuperGradients, Deci’s open source library.

YOLO-NAS was trained with SuperGradients, an open source library that is available on GitHub that you can use to train various object detection models like YOLOX, YOLO-NAS, and many other algorithms, including PP-YOLO, and more.

6. There’s a new and advanced training scheme.

Because it’s a foundation model, the creators of YOLO-NAS didn’t limit the training to Pascal, COCO, or any of those data sets. To enrich the data set with more data and advanced training techniques, Object 365 was used as an additional data set for training. Knowledge distillation from a pre-trained teacher model was incorporated. YOLO-NAS was also trained using pseudo-labeled data and some other techniques to improve the training scheme of the model.

7. Quantization-aware building blocks.

After the training, post-training quantization (PTQ) was applied and the network was converted into INT8 with minimal loss of accuracy.

The result of all of these factors? As demonstrated in the chart below, the YOLO-NAS (m) model delivers a 50% (x1.5) increase in throughput and 1 mAP better accuracy compared to other SOTA YOLO models on the NVIDIA T4 GPU.

Margaretta Colangelo 1 个月前

Artificial Intelligence #112

Andriy Burkov 2 年前

Major software libraries for physics-informed machine…

Holger Marschall 3 周前

Interested in learning more about YOLO-NAS? Watch this video for a more in-depth discussion on YOLO-NAS and how to get started.

Get ahead with the latest deep learning content

GPT-3.5 Turbo can now be finetuned by businesses using their own data. OpenAI shared that the resulting custom model can match or exceed the abilities of GPT-4 for certain tasks.
Python in Excel . Microsoft announced the public preview allowing advanced spreadsheet users to combine Python scripts and Excel formulas in the same workbook.
SeamlessM4T can translate 100 languages into text or speech. Meta released the AI model to help researchers build on the development and use it to deliver speech-to-speech, speech-to-text, text-to-speech, and text-to-text translation applications.
Introducing DeciCoder . A 1B-parameter open-source LLM for code generation that delivers a 3.5x increase in throughput, improved accuracy on the HumanEval benchmark, and smaller memory usage compared to widely-used code generation LLMs such as SantaCoder.
Researchers from the Meta AI (FAIR), Mila-Quebec AI Institute, and Université de Montréal released a new collection of synthetic Photorealistic Unreal Graphics (PUG) datasets . It includes animals for foundation model research, PUG: ImageNet to assess image classifier robustness, and PUG: SPAR for vision-language model evaluation.

Save the date: Learn more about DL inference on edge devices and open source LLM ops!

[Live Webinar] How to Accelerate DL Inference on Edge Devices | August 30th

Learn how to optimize your deep learning models for maximum speed and efficiency. From quantization to multi-stream inference, Ran Zilberstein, VP of Engineering at Deci, will share practical tips and best practices to help you leverage the full potential of your edge devices. Save your spot!

[Live Webinar] RAG with Llama 2 and LangChain: Building with Open-Source LLM Ops?| August 31st

Discover how to combine Llama 2 with LangChain to build a Retrieval Augmented Generation (RAG) or Retrieval Augmented Question Answering (RAQA) system to analyze the most popular internet phenomenon on the planet: Barbie + Oppenheimer = Barbenheimer. Save your spot!

Enjoyed these deep learning tips? Help us make our newsletter bigger and better by sharing it with your colleagues and friends!

YOLO-NAS: 7 Factors to Success

Deci AI (Acquired by NVIDIA)

Deci enables deep learning to live up to its true potential by using AI to build better AI.

领英推荐

Get ahead with the latest deep learning content

Save the date: Learn more about DL inference on edge devices and open source LLM ops!

Deep Learning Tip of the Month

5,192 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

How to become a DataScientist

Saturday with Math (Sep 21st )

3 New GenAI Web APIs

The TensorFlow Transcendence: Tracing the Nebulous Pathways of Machine Intelligence

Understanding the P=NP Problem for the Future of Artificial Intelligence

Attention is not Exactly What you Need. Introducing Mamba!

Allow mathematicians to pierce artificial intelligence frontiers*

Intelligence and Combinatorial Complexity

What is the Future of Data Analysis?

Here Is Google DeepMind’s New Research To Build Massive LLMs With A Mixture Of Million Experts

领英推荐

Get ahead with the latest deep learning content

Save the date: Learn more about DL inference on edge devices and open source LLM ops!

Deep Learning Tip of the Month

5,192 位关注者

How to Improve Small Object Detection Accuracy Without Increasing Latency

2024年3月28日

Just Launched: Deci’s Gen AI Development Platform and Deci-Nano

2024年3月15日

What makes LLM inference more challenging than traditional NLP?

2024年3月8日

YOLO-NAS-Sat: A Small Object Detection Model for Edge Deployment

2024年2月24日

Exploring the Modern Transformer - From 'Attention Is All You Need' to SwiGLU, RoPE, and GQA

2024年2月22日

How to Build Better AI Models with a Production-Aware Approach and NAS

2024年1月26日

DeciCoder-6B and DeciDiffusion 2.0: Models Built for Accuracy, Speed, and Cost-Efficiency

2024年1月18日

Maximizing LLM Inference Speed: Proven Strategies and Best Practices

2023年12月28日

DeciLM-7B: The Fastest and Most Accurate 7 Billion-Parameter LLM to Date ??

2023年12月12日

Key Factors to Success of YOLO-NAS Pose ??

2023年11月23日

社区洞察

其他会员也浏览了

How to become a DataScientist

Saturday with Math (Sep 21st )

3 New GenAI Web APIs

The TensorFlow Transcendence: Tracing the Nebulous Pathways of Machine Intelligence

Understanding the P=NP Problem for the Future of Artificial Intelligence

Attention is not Exactly What you Need. Introducing Mamba!

Allow mathematicians to pierce artificial intelligence frontiers*

Intelligence and Combinatorial Complexity

What is the Future of Data Analysis?

Here Is Google DeepMind’s New Research To Build Massive LLMs With A Mixture Of Million Experts