登录查看更多内容

How to Select the Right GPU Instance for Your Team on AWS?

Raahul Dutta (He/Him) ??

?? Building the Protocol for Agent-to-Agent Communication <??> | 10+ Patents

发布日期: 2022年10月12日

Imagination is the exaggeration of the data you have in your brain. I want to train a diffusion model to compose a piece of music on a lovely evening in Amsterdam. But I require an AWS GPU instance to achieve this. We, the machine learning engineers, are always baffled about the optimal GPU instance on GPU. I completed a short study on this, and the outcome is that:

After reading the article hopefully, you can select the right GPU instance for your work.

The Decision Tree to Decide GPU

DrawIo Link

If you are doing HPC (High-Performance Job) like Drug Discovery or High Precision Job, then we suggest following the P (historically called?Performance-Heavy) Instance Family. Else we recommend following the Instance Family. I am providing the cost chart for the noted GPU instances.

P3 and P4 Instances Cost

G4 and G5 Instances Cost

This image is not related to the GPU instance - I got the picture from Dall-E, it's not an actual view. The concept was generated with the diffusion model.

Always Don't Select GPU on the Price Ground

Please don't select the GPU always as per the pricing basis. We executed a little experiment- We trained a?Scibert Transformer model with 100K data points.

The result on?a g4dn.2xlarge?machine:

The cost is : (0.752 * 311.25) / 3600  = $0.06

?We executed on a g5.xlarge machine too. The result is on the same configuration :

The Cost is : (1.006 * 197.31) / 3600  = $0.05

领英推荐

LLM Inference War Begins

AIM 5 个月前

?? How to Get Lightning-Fast LLMs

AlphaSignal 1 年前

How to choose a GPU for machine learning?

ZNet Technologies Private Limited 2 年前

So if we use a g5.xlarge machine then we can save 20% of our budget.

And other benefits of a g5 family over a g4 family are:

NVIDIA Ampere Architecture is modern architecture, it supports all the precision formats.
We should follow the Mixed Precision Training in PyTorch-based project. There are different types of floating datatypes - FP32, FP16, TF32, BF16

source: NVIDIA Blog

We executed the apple-to-apple comparison with the fp16 datatype because tf32 and bf16 need the Ampere Architecture which is available under the g5 instances family.

PS: What are TF32 and BF16?

BF16

If you have access to an Ampere or newer hardware you can use bf16 for your training and evaluation. While bf16 has a worse precision than fp16, it has a much much bigger dynamic range. Therefore, if in the past you were experiencing overflow issues while training the model, bf16 will prevent this from happening most of the time. Remember that in fp16 the biggest number you can have is `65535` and any number above that will overflow. A bf16 number can be as large as `3.39e+38` (!) which is about the same as fp32 - because both have 8-bits used for the numerical range.

TF32

The Ampere hardware uses a magical data type called tf32. It has the same numerical range as fp32 (8-bits), but instead of 23 bits precision it has only 10 bits (same as fp16) and uses only 19 bits in total.

It’s magical in the sense that you can use the normal fp32 training and/or inference code and by enabling tf32 support you can get up to 3x throughput improvement.

When this is done CUDA will automatically switch to using tf32 instead of fp32 where it’s possible. This, of course, assumes that the used GPU is from the Ampere series.

Like all cases with reduced precision this may or may not be satisfactory for your needs, so you have to experiment and see. According to NVIDIA research the majority of machine learning training shouldn’t be impacted and showed the same perplexity and convergence as the fp32 training.

And I found the detailed AWS GPU Instance details for further reading:

Reference

Debesh Choudhury, PhD

Information Security Researcher, Academician, Entrepreneur | Password & Cybersecurity, Digital Identity, Biometrics Limit, 3D Education | Linux Trainer | Writer | Podcast Host

1 年

Raahul Dutta (He/Him) ??, Although I am not in favor of composing music with ML, I like your systematic approach and cost analyses for the utilization of AWS NVIDIA GPU resources.

要查看或添加评论，请登录

Raahul Dutta (He/Him) ??的更多文章

Calculus isn't boring - Tensorflow Part2

2019年3月25日

Calculus isn't boring - Tensorflow Part2

Writing from Antarctica — We just shifted our house, my room name is ‘Antarctica.’ Robots may book an air ticket to…
Learn Basic Tensorflow Part 1

2019年3月6日

Learn Basic Tensorflow Part 1

Hola!!! Last year I searched for a proper tensorflow tutorial, but I could not find, It was scattered. Google’s…

2 条评论
Try this new dating app

2016年11月8日

Try this new dating app

Although I am not in the dating market, but I have seen many of my friends swiping right for finding the special…

4 条评论
Reduce Your Expenses

2015年4月14日

Reduce Your Expenses

One of the smartest ways to reduce your expenses is to use coupons. Use coupons whenever and wherever possible as you…

3 条评论

How to Select the Right GPU Instance for Your Team on AWS?

Raahul Dutta (He/Him) ??

?? Building the Protocol for Agent-to-Agent Communication <??> | 10+ Patents

The Decision Tree to Decide GPU

P3 and P4 Instances Cost

G4 and G5 Instances Cost

Always Don't Select GPU on the Price Ground

领英推荐

So if we use a g5.xlarge machine then we can save 20% of our budget.

PS: What are TF32 and BF16?

BF16

TF32

Reference

Raahul Dutta (He/Him) ??的更多文章

社区洞察

其他会员也浏览了

Powering Up Your AI: A Guide to Selecting the Ideal Server, CPU, and GPU Components

Memory Bandwidth Explained: Typical Challenges and Practical Solutions

GPU Servers for AI: Everything You Need to Know

#32: Implementing Fractional GPUs on Kubernetes ??

AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

Run MATLAB using GPUs in the Dataoorts Cloud

Troubleshooting the Most Common CUDA Installation Errors

In Network Acceleration for AI/ML Workloads

Introduction To GPUs

The Great GPU Shortage and the GPU Rich/Poor

The Decision Tree to Decide GPU

P3 and P4 Instances Cost

G4 and G5 Instances Cost

Always Don't Select GPU on the Price Ground

领英推荐

So if we use a g5.xlarge machine then we can save 20% of our budget.

PS: What are TF32 and BF16?

BF16

TF32

Reference

Raahul Dutta (He/Him) ??的更多文章

Calculus isn't boring - Tensorflow Part2

Learn Basic Tensorflow Part 1

Try this new dating app

Reduce Your Expenses

社区洞察

其他会员也浏览了

Powering Up Your AI: A Guide to Selecting the Ideal Server, CPU, and GPU Components

Memory Bandwidth Explained: Typical Challenges and Practical Solutions

GPU Servers for AI: Everything You Need to Know

#32: Implementing Fractional GPUs on Kubernetes ??

AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

Run MATLAB using GPUs in the Dataoorts Cloud

Troubleshooting the Most Common CUDA Installation Errors

In Network Acceleration for AI/ML Workloads

Introduction To GPUs

The Great GPU Shortage and the GPU Rich/Poor