登录查看更多内容

Unlock the Power of Model Compression with Intel? Neural Compressor

Zac Zacharia

Lead Solution Architect - Data & AI | Cloud-Native Architectures | AI/ML Operationalization | Kafka, AWS, TensorFlow | Driving Scalable Innovation

发布日期: 2024年6月17日

In the rapidly evolving field of machine learning and AI, efficient model deployment is crucial. Intel? Neural Compressor is a versatile tool that offers popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search. It supports mainstream frameworks like TensorFlow, PyTorch, ONNX Runtime, and MXNet, along with Intel-specific extensions such as Intel Extension for TensorFlow and Intel Extension for PyTorch.

Key Features and Benefits

Broad Hardware Support

Intel? Neural Compressor supports a wide range of Intel hardware, including:

Intel Xeon Scalable Processors
Intel Xeon CPU Max Series
Intel Data Center GPU Flex Series
Intel Data Center GPU Max Series

Additionally, it offers limited support for AMD CPUs, ARM CPUs, and NVIDIA GPUs via ONNX Runtime.

Extensive Model Validation

The tool validates popular large language models (LLMs) such as LLama2, Falcon, GPT-J, Bloom, and OPT, as well as over 10,000 other models like Stable Diffusion, BERT-Large, and ResNet50. This validation is done using a zero-code optimization solution, Neural Coder, and automatic accuracy-driven quantization strategies.

Getting Started with Intel? Neural Compressor

Installation

Install the Neural Compressor from PyPI:

pip install neural-compressor

Setting Up the Environment

Set up the environment with the necessary packages:

领英推荐

Accelerators rev up semiconductor circuit

Nivruti Rai 1 年前

The Inference Economy: Nvidia’s Training Dominance and…

Anshuman Jha 3 周前

Everything You Need to Know About Hardware…

eInfochips (An Arrow Company) 1 年前

pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision

Weight-Only Quantization for LLMs

Here’s a demonstration of Weight-Only Quantization for LLMs. This method supports Intel CPUs, Intel Gaudi2 AI Accelerators, and NVIDIA GPUs, automatically selecting the best device.

For Intel Gaudi2, using a Docker image with the Gaudi Software Stack is recommended. Below is the script for environment setup:

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1:latest

# Check the container ID
docker ps

# Log into the container
docker exec -it <container_id> bash

# Install the optimum-habana
pip install --upgrade-strategy eager optimum[habana]

# Install INC/auto_round
pip install neural-compressor auto_round

Run the example:

from transformers import AutoModel, AutoTokenizer
from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor.quantization import fit
from neural_compressor.adaptor.torch_utils.auto_round import get_dataloader

model_name = "EleutherAI/gpt-neo-125m"
float_model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
dataloader = get_dataloader(tokenizer, seqlen=2048)

woq_conf = PostTrainingQuantConfig(
    approach="weight_only",
    op_type_dict={
        ".*": {  # match all ops
            "weight": {
                "dtype": "int",
                "bits": 4,
                "algorithm": "AUTOROUND",
            },
        }
    },
)
quantized_model = fit(model=float_model, conf=woq_conf, calib_dataloader=dataloader)

Note: For INT4 model inference, use Intel Extension for Transformers, which leverages Intel Neural Compressor for model quantization.

Static Quantization for Non-LLMs

Here's an example of Static Quantization using a ResNet18 model:

from torchvision import models
from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor.data import DataLoader, Datasets
from neural_compressor.quantization import fit

float_model = models.resnet18()
dataset = Datasets("pytorch")["dummy"](shape=(1, 3, 224, 224))
calib_dataloader = DataLoader(framework="pytorch", dataset=dataset)
static_quant_conf = PostTrainingQuantConfig()
quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloader=calib_dataloader)

By leveraging Intel? Neural Compressor, you can achieve efficient model compression, enhancing performance and reducing latency across a wide range of hardware. Start optimizing your models today and unlock new potentials in AI and machine learning deployment! Github Repo

要查看或添加评论，请登录

Zac Zacharia的更多文章

How Hyperdimensional Computing (HDC) and Apache Iceberg Can Transform Information Management for Venture Capital Firms

2025年1月17日

How Hyperdimensional Computing (HDC) and Apache Iceberg Can Transform Information Management for Venture Capital Firms

Introduction Venture capital (VC) firms operate in a data-rich but fragmented environment. With portfolios spanning…
Revolutionizing Investment Strategies with Hyperdimensional Computing and Snowflake

2025年1月16日

Revolutionizing Investment Strategies with Hyperdimensional Computing and Snowflake

Introduction In the world of investment management, firms are constantly seeking innovative ways to gain a competitive…
Revolutionizing Venture Capital (VC) Data Analytics with BoostHD and Hyperdimensional Computing

2025年1月16日

Revolutionizing Venture Capital (VC) Data Analytics with BoostHD and Hyperdimensional Computing

The Venture Capital (VC) sector thrives on data-driven decision-making. From identifying promising startups to…

4 条评论
Boosting Hyperdimensional Computing for Enhanced Reliability in Healthcare

2025年1月16日

Boosting Hyperdimensional Computing for Enhanced Reliability in Healthcare

In the rapidly evolving field of artificial intelligence, Hyperdimensional Computing (HDC) has emerged as a powerful…
Intelligent Predictions for Smarter Investments

2025年1月15日

Intelligent Predictions for Smarter Investments

Revolutionizing Finance with High-Dimensional Computing Introduction In the fast-paced world of finance, making…
Revolutionizing Language Models with Hyperdimensional Computing and Quantum Memory

2025年1月9日

Revolutionizing Language Models with Hyperdimensional Computing and Quantum Memory

Introduction The realm of natural language processing (NLP) has been revolutionized by advanced language models with…
Quantum Minds

2025年1月8日

Quantum Minds

Mapping Consciousness through Hyperdimensional Intelligence Abstract This paper explores the intersection of quantum…
Quantum Spin Systems:

2025年1月7日

Quantum Spin Systems:

Boldly Simulating What No One Has Before In the rapidly advancing fields of quantum computing and material science, one…
Harnessing Non-Abelian Group VSAs for Quantum-Inspired Computing

2025年1月6日

Harnessing Non-Abelian Group VSAs for Quantum-Inspired Computing

In the world of machine learning and quantum computing, where innovation drives progress, we’re constantly on the…
Light in the HyperDimensions

2025年1月6日

Light in the HyperDimensions

Advancing HDC with Photonic Circuits and Kanerva’s Sparse Memory Abstract This report extends the integration of…

See all articles

Unlock the Power of Model Compression with Intel? Neural Compressor

Zac Zacharia

Lead Solution Architect - Data & AI | Cloud-Native Architectures | AI/ML Operationalization | Kafka, AWS, TensorFlow | Driving Scalable Innovation

Key Features and Benefits

Broad Hardware Support

Extensive Model Validation

Getting Started with Intel? Neural Compressor

Installation

Setting Up the Environment

领英推荐

Weight-Only Quantization for LLMs

Static Quantization for Non-LLMs

Zac Zacharia的更多文章

社区洞察

其他会员也浏览了

Here come the Inferencing ASIC's

AI CHIP WAR IN ON

Don’t Count Out Human Intelligence Just Yet: The DeepSeek Lesson

Last Week on AI - no. 35

The opportunities arising from the rise of generative AI- (1/n)

The Stargate Project: A Bet on Yesterday’s Technology?

Power to the AIs

AI Showdown: Microsoft's Phi-2, Intel's Gaudi3, and Brainoware's Biocomputing Merge!

Vector Databases – Delivering Value through AI

Powering AI in the Data Center

Key Features and Benefits

Broad Hardware Support

Extensive Model Validation

Getting Started with Intel? Neural Compressor

Installation

Setting Up the Environment

领英推荐

Weight-Only Quantization for LLMs

Static Quantization for Non-LLMs

Zac Zacharia的更多文章

How Hyperdimensional Computing (HDC) and Apache Iceberg Can Transform Information Management for Venture Capital Firms

Revolutionizing Investment Strategies with Hyperdimensional Computing and Snowflake

Revolutionizing Venture Capital (VC) Data Analytics with BoostHD and Hyperdimensional Computing

Boosting Hyperdimensional Computing for Enhanced Reliability in Healthcare

Intelligent Predictions for Smarter Investments

Revolutionizing Language Models with Hyperdimensional Computing and Quantum Memory

Quantum Minds

Quantum Spin Systems:

Harnessing Non-Abelian Group VSAs for Quantum-Inspired Computing

Light in the HyperDimensions

社区洞察

其他会员也浏览了

Here come the Inferencing ASIC's

AI CHIP WAR IN ON

Don’t Count Out Human Intelligence Just Yet: The DeepSeek Lesson

Last Week on AI - no. 35

The opportunities arising from the rise of generative AI- (1/n)

The Stargate Project: A Bet on Yesterday’s Technology?

Power to the AIs

AI Showdown: Microsoft's Phi-2, Intel's Gaudi3, and Brainoware's Biocomputing Merge!

Vector Databases – Delivering Value through AI

Powering AI in the Data Center