OmniML转发了
Inference efficiency is the key enabler for the #genai revolution. Sitting in the center of it is algorithm optimizations such as quantizing the model to lower precisions, or pruning the model weights with structured sparsity. The newly announced #nvidia TensorRT Model Optimizer is the beginning of a unified platform for algorithmic inference optimizations, starting for all the best quantization recipes such like AWQ, SmoothQuant, etc. Many years of research and engineering on #modelcompression have been dedicated to this topic and there are much more to add. We are super excited that our product is finally launched. Check out the blog post for details, and follow our public examples and documentations on github: https://lnkd.in/gwaTDHj7
NVIDIA TensorRT Model Optimizer…the newest member of the #TensorRT ecosystem is a library of post-training and training-in-the-loop model optimization techniques: ?Post-training quantization ?Quantization-aware training ?Sparsity Read our blog ?? https://nvda.ws/3Wt7nUA