PTQ and QAT: Best Practices for Performing Hybrid and Selective Quantization
Deci AI (Acquired by NVIDIA)
Deci enables deep learning to live up to its true potential by using AI to build better AI.
While quantization can solve poor runtime performance, memory and model size constraints, and hardware limitations, it has its own issues. Some architectures are difficult to quantize, accuracy can decrease, and calibration, a must-step in INT8 quantization, also has a couple of challenges.
This is where hybrid and selective quantization come in. Unlike na?ve quantization, which applies the same quantization methods for all neural network layers, hybrid and selective quantization have additional steps, resulting in better speed without the negative impact on accuracy. You can apply these approaches during post-training quantization (PTQ) and quantization-aware training (QAT).
PTQ is a quantization technique where the model is quantized after it has been trained. QAT is a finetuning of the PTQ model, where the model is further trained with quantization in mind.
Here are rules of thumb for applying hybrid and selective quantization during PTQ and QAT:
Now see the results on the STDC semantic segmentation model on Pascal VOC. The throughput, model size, and latency are close between na?ve quantization, and selective PTQ and QAT. But look at the accuracy with selective PTQ, there was only a very small decrease. And with selective QAT, the accuracy even got get better.
You can easily do hybrid and selective PTQ and QAT using SuperGradients, our open-source library for training PyTorch-based computer vision models.
To take a deeper dive into quantization and learn how you can improve your model’s speed without reducing its accuracy, watch the webinar or read the ultimate guide.
领英推荐
Get ahead with the latest deep learning content
Catch Deci at NVIDIA GTC 2023
Join our CEO & Co-Founder, Yonatan Geifman, at his GTC session on “How to Accelerate NLP Performance on GPU with Neural Architecture Search.” He’s going to take a deep dive into NLP inference performance optimization, covering the details, challenges that should be addressed, as well as tools and best practices to adopt, to achieve the best possible results without sacrificing the model’s accuracy. Register here.
Can you solve this riddle? Comment your answer below.
Don't forget to catch next month's newsletter to get the answer to the riddle. ??
Enjoyed these deep learning tips? Help us make our newsletter bigger and better by sharing it with your colleagues and friends!
Algorithm Developer
1 年how you select filter dynamic range after multiplication 16 bits *16 bit will be more >16