登录查看更多内容

NVIDIA Mixed Precision & Power Consumption - Part 1

Andrew Antonopoulos

Senior Solutions Architect at Sony Professional Solutions Europe

发布日期: 2024年5月14日

Deep Learning has enabled progress in many different applications and can be used for developing models for classification and regression implementations.

Larger models usually require more computing and memory resources to train, and modern deep-learning training systems use a single-precision (FP32) format.

The IEEE Standard for Floating-Point Arithmetic is the common convention for representing numbers in binary on computers. In double-precision format, each number takes up 64 bits. Single-precision format uses 32 bits, while half-precision is just 16 bits.

In single-precision, 32-bit format, one bit is used to tell whether the number is positive or negative. Eight bits are reserved for the exponent, which (because it’s binary) is 2 raised to some power. The remaining 23 bits are used to represent the digits that make up the number, called the significand.

Double precision instead reserves 11 bits for the exponent and 52 bits for the significand, dramatically expanding the range and size of numbers it can represent. Half precision takes an even smaller slice of the pie, with just five for bits for the exponent and 10 for the significand.

The following image visualises the above information:

https://blogs.nvidia.com/blog/whats-the-difference-between-single-double-multi-and-mixed-precision-computing/#:~:text=In%20single%2Dprecision%2C%2032%2D,the%20number%2C%20called%20the%20significand

and if we want to represent PI by using the precision levels, will look like this:

In the following paper, Nvidia introduces a methodology for training deep neural networks using half-precision floating point numbers without losing model accuracy or modifying hyper-parameters (which is a time-consuming process). The paper suggests that this method nearly halves memory requirements and speeds up arithmetic on recent GPUs. Weights, activations, and gradients are stored in IEEE half-precision format.

Nvidia paper: https://arxiv.org/pdf/1710.03740

According to Nvidia, the performance (speed) of any program, including neural network training and inference, is limited by one of three factors:

arithmetic
bandwidth
memory bandwidth or latency.

Reduced precision addresses two of these limiters. Memory bandwidth pressure is lowered by using fewer bits to store the same number of values. Arithmetic time can also be lowered on processors that offer higher throughput for reduced precision math. For example, half-precision math throughput in recent GPUs is 2× to 8× higher than for single-precision. In addition to speed improvements, reduced precision formats also reduce the amount of memory required for training.

Mixed precision uses 16-bit and 32-bit floating-point types in a model during training to make it run faster and use less memory. By keeping certain parts of the model in the 32-bit types for numeric stability, the model will have a lower step time and train equally in terms of evaluation metrics such as accuracy.

Most models use the float32 dtype, which takes 32 bits of memory. However, there are two lower-precision dtypes, float16 and bfloat16, each taking 16 bits of memory instead. Modern accelerators can run operations faster in the 16-bit dtypes, as they have specialised hardware to run 16-bit computations, and 16-bit dtypes can be read from memory faster.

Nvidia GPUs can run operations in float16 faster than in float32 and TPUs. However, variables and a few computations should still be in float32 for numeric reasons so that the model trains to the same quality.

Implementation

To implement mixed precision, you will need to use the Keras mixed precision API, which allows you to use a mix of either float16 or bfloat16 with float32 to get the performance benefits from float16/bfloat16 and the numeric stability benefits from float32.

Initially will need to use the appropriate libraries:

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import mixed_precision

Mixed precision will only speed up models on recent NVIDIA GPUs, Cloud TPUs, and Intel CPUs. NVIDIA GPUs use a mix of float16 and float32, while TPUs and Intel CPUs support a mix of bfloat16 and float32.

Among NVIDIA GPUs, those with compute capability 7.0 or higher will see the greatest performance benefit from mixed precision because they have special hardware units called Tensor Cores to accelerate float16 matrix multiplications and convolutions. Older GPUs offer no math performance benefit for using mixed precision. However, memory and bandwidth savings can enable some speedups.

You can check your GPU type with the following command, which exists if the NVIDIA drivers are installed.

nvidia-smi -L

and the output will be similar to this:

GPU 0: NVIDIA GeForce RTX 4060 Ti (UUID: <UUID number>)

To use mixed precision in Keras, will need to create a tf.keras.mixed_precision.Policy, typically referred to as a dtype policy.

Dtype policies specify the dtypes layers that will run. Will need to construct a policy from the string 'mixed_float16' and set it as the global policy. This will cause subsequent layers to use mixed precision with a mix of float16 and float32.

# Policy for mixed precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

The policy specifies two important aspects of a layer: the dtype the layer's computations are done in and the dtype of a layer's variables. With this policy, layers use float16 computations and float32 variables. Computations are done in float16 for performance, but variables must be kept in float32 for numeric stability.

Validation

To validate the mixed precision will need to print out the dtype policy by using this code:

领英推荐

OpenAI Introduces Whisper, The Case for “Single Basin…

Lightning AI 2 年前

Top 10 AI Concepts Every Scientific R&D Leader Should…

Enthought 1 年前

Computer Vision: The Evolution and Impacts on AI and…

TRG Datacenters 5 个月前

# Print out the dtype policy for compute and variables
print('Compute dtype: %s' % policy.compute_dtype)
print('Variable dtype: %s' % policy.variable_dtype)

and the output will be similar to this:

INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9
Compute dtype: float16
Variable dtype: float32

As you can see from the above output, the GPU has a compute capability of 8.9, and the policy has been set up successfully.

Testing & Power Consumption

After reading the Nvidia paper, the question was raised: Will this provide any benefit during the ML training, and will it reduce the hardware's carbon footprint?

Calculating the carbon footprint will require 4 steps:

Use the overall power consumption, which is usually in Watts (as taken from Wattmeter) and convert it to kW by using this equation: kW = Watts / 1000
Convert the kW to kWh by using the kW from step 1 and the number of hours for model training. The equation for this step looks like this: kWh = kW * Number of hours for model training
To calculate the carbon footprint in gCO2e/Kwh, you will need the carbon intensity as well. For carbon intensity, you can use this website, which provides the info per country: https://app.electricitymaps.com/zone/GB. After collecting the carbon intensity, we can apply all the info to the final equation: Carbon Footprint (gCO2e/kWh) = Power Consumption (kWh) * Carbon Intensity (gCO2e)

The Dataset that was used during the test was images of bird species, which included 525 bird species, with 84635 training images, 2625 test images (5 images per species), and 2625 validation images (5 images per species).

Four tests were completed by using different hyper-parameters and mixed precision as a floating point. The ML model configuration for each test was the following:

Benchmarking

Floating point: 32-bit
Batch size: 32
Neurons: 1024
Epochs: 25

1st Experiment

Floating point: Mixed Precision
Batch size: 32
Neurons: 1024
Epochs: 25

2nd Experiment

Floating point: Mixed Precision
Batch size: 256
Neurons: 1024
Epochs: 25

3rd Experiment

Floating point: Mixed Precision
Batch size: 256
Neurons: 2048
Epochs: 25

The GPU power consumption, utilisation and overall power consumption across all the tests can be seen in the following image:

and a graphical presentation of the above tests can be seen in the following graph:

Overall, the test results confirmed that using mixed precision for a classification model will reduce power consumption but require adjusting the hyper-parameters. The 3rd experiment used more neurons, which forced the GPU to work harder but kept the power consumption at a low level, which is close to the 2nd experiment, which used fewer neurons.

Mixed precision is a great option for training models, specifically when using Nvidia GPUs.

Check Part 2 for more information about loss and accuracy

#Nvidia #ML #Carbonfootprint #mixedprecision #GPU

要查看或添加评论，请登录

Andrew Antonopoulos的更多文章

Sustainable ML - Monitor Power Consumption

2024年5月25日

Sustainable ML - Monitor Power Consumption

Training models will also consider the power consumption of the hardware. The following paper compares the most common…
TensorFlow Serving API & gRPC

2024年5月25日

TensorFlow Serving API & gRPC

To serve models for production applications, one can use REST API or gRPC. gRPC is a high-performance, binary, and…
Blockchain & Web3 Technology

2024年5月22日

Blockchain & Web3 Technology

Blockchain is a technology that securely stores transactional information by linking blocks together in a specific…
NVIDIA Mixed Precision - Loss & Accuracy - Part 2

2024年5月20日

NVIDIA Mixed Precision - Loss & Accuracy - Part 2

Part 1 explained how Nvidia's mixed precision can help reduce power consumption. However, we also need to consider…
Nvidia GPU & TensorFlow for ML in Ubuntu 24.04 LTS

2024年5月13日

Nvidia GPU & TensorFlow for ML in Ubuntu 24.04 LTS

Tensorflow announced that it would stop supporting GPUs for Windows. The latest support version was 2.

5 条评论
FreeBSD 13 & TCP BBR Congestion Control

2022年4月29日

FreeBSD 13 & TCP BBR Congestion Control

Finally TCP BBR is available for FreeBSD new release 13.x.

2 条评论
Kubernetes - Open Source Tools

2020年6月17日

Kubernetes - Open Source Tools

Kubernetes (also known as k8s or “kube”) is a very popular container orchestration platform that automates many of the…
Cache-Control Headers

2020年6月17日

Cache-Control Headers

The performance of content that is available via web sites and applications can be significantly improved by reusing…
CDN Cache and Machine Learning

2020年6月17日

CDN Cache and Machine Learning

The majority of the Internet’s content is delivered by global caching networks, also known as Content Delivery Networks…
OTT & Mobile Battle in Africa

2019年9月5日

OTT & Mobile Battle in Africa

OTT and specially SVOD is growing in Africa. Recently big OTT providers such as Netflix, muvi, Showmax, iFlix, MTN and…

See all articles

NVIDIA Mixed Precision & Power Consumption - Part 1

Andrew Antonopoulos

Senior Solutions Architect at Sony Professional Solutions Europe

Implementation

Validation

领英推荐

Testing & Power Consumption

Andrew Antonopoulos的更多文章

社区洞察

其他会员也浏览了

Wisdom over (Artificial) Intelligence

IBM has made a new, highly efficient AI processor

How Google uses AI to save Millions of Dollars on Computing Chip Design

AI Object Detection Solution: YOLOv5 demo for real-time object detection | Deep Learning | Machine Learning

GenAI Core Topics Explained in Simple Pictures

What Does "Intel Inside" Really Mean? Answer: Big Data Powered-AI Corporation

Fireside Chat With Ilya Sutskever and Jensen Huang

The world's smartest supercomputer vs. the world's fastest supercomputer

Trying to make sense of History in the making : AI for a dinosaur

Software is Powering the AI Revolution

Implementation

Validation

领英推荐

Testing & Power Consumption

Andrew Antonopoulos的更多文章

Sustainable ML - Monitor Power Consumption

TensorFlow Serving API & gRPC

Blockchain & Web3 Technology

NVIDIA Mixed Precision - Loss & Accuracy - Part 2

Nvidia GPU & TensorFlow for ML in Ubuntu 24.04 LTS

FreeBSD 13 & TCP BBR Congestion Control

Kubernetes - Open Source Tools

Cache-Control Headers

CDN Cache and Machine Learning

OTT & Mobile Battle in Africa

社区洞察

其他会员也浏览了

Wisdom over (Artificial) Intelligence

IBM has made a new, highly efficient AI processor

How Google uses AI to save Millions of Dollars on Computing Chip Design

AI Object Detection Solution: YOLOv5 demo for real-time object detection | Deep Learning | Machine Learning

GenAI Core Topics Explained in Simple Pictures

What Does "Intel Inside" Really Mean? Answer: Big Data Powered-AI Corporation

Fireside Chat With Ilya Sutskever and Jensen Huang

The world's smartest supercomputer vs. the world's fastest supercomputer

Trying to make sense of History in the making : AI for a dinosaur

Software is Powering the AI Revolution