登录查看更多内容

Revolutionise your PyTorch Workflow: How to Speed Up Your Deep Learning Training with This Simple Hack!

Jozsef S.

Generative AI, ML Engineering

发布日期: 2023年3月17日

One of the consistent #pytorch / deep learning #designpatterns that you might come across in documentations and tutorials involves three steps:

loading images, normalising and augmenting them, and converting them to tensors (all of which happens on the CPU).
Then, the data is passed to a data loader, where it is pinned to memory and processed by as many workers as possible.
Finally, in the training loop, the data is loaded from the CPU to the GPU as needed.

While this pattern is effective for simple data augmentations and smaller image sizes, it can lead to CPU bottlenecks when working with more complex augmentations or larger images. In these cases, an alternative design pattern can be more efficient. Here's how it works:

First, convert the image to a tensor, as the first step after loading it from disk and then immediately load it into the GPU. Everything else, including normalisation and data augmentation, should take place on the GPU.
Next, make two adjustments to the data loader since the data is already on the GPU: set pin_memory to False and reduce the number of workers if needed. Keep in mind, your data loader now lives on your GPU, so you need to manage your worker count and batch size accordingly.
Finally, in the training loop, no changes are necessary.

领英推荐

Issue #274 - The ML Engineer ??

Alejandro Saucedo 6 个月前

How to Set Up a Private GPT: Step by Step

Husam Yaghi, Ph.D 2 个月前

How Big Deep Learning Models are Trained? A Book Review

Ashish Patel ???? 2 年前

Why is this better? Moving data between the main memory and the GPU memory is time consuming and every time such load happens the GPU is forced to wait. With this setup you are ensuring the prefetch loads the data all the way into GPU memory directly and the GPU won't sit idle between batches.

By following this pattern, you can significantly reduce CPU bottlenecks, speed up your deep learning training and utilise your expensive GPU resources more efficiently.

See below sample code to illustrate, but if you want to see how this works in action I recommend checking out this repository of mine on #github

import torch
from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import transforms

imagenet_mean = (0.485, 0.456, 0.406)
imagenet_std = (0.229, 0.224, 0.225)
device = 'cuda'

class CustomDataset(Dataset):
? ? def __init__(self, data):
? ? ? ? self.data = data
? ? ? ? self.to_tensor = transforms.Compose([
? ? ? ? ? ? transforms.ToTensor()
? ? ? ? ])
        self.transform = transforms.Compose([
            transforms.RandomVerticalFlip(),
            transforms.RandomHorizontalFlip(),
            transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1),
            transforms.RandomGrayscale(),
            transforms.Normalize(imagenet_mean, imagenet_std)
        ])
        ? ? ? ??
? ? def __len__(self):
? ? ? ? return len(self.data)
? ??
? ? def __getitem__(self, idx):
? ? ? ? img = self.data[idx]
        
        # this is the interesting part
        img = self.to_tensor(img)
        img = img.to(device)
? ? ? ? img = self.transform(img)

? ? ? ? return img


data = ... # load your dataset here


dataset = CustomDataset(data)


batch_size = 32
num_workers = 4


dataloader = DataLoader(dataset, batch_size=batch_size, num_workers=num_workers, pin_memory=False)


for batch in dataloader:
? ? # do your training here

Revolutionise your PyTorch Workflow: How to Speed Up Your Deep Learning Training with This Simple Hack!

Jozsef S.

Generative AI, ML Engineering

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

January-February Bits & Bytes in AI/ML

Why Choosing TensorFlow Lite?

7 Best Laptops For Deep Learning and Data Science in 2020

What is Artificial Intelligence & Machine Learning ?

AI Innovations Galore: PyTorch's Hidet Compiler, Google's DeepMind Merge, and NimbleBox's ChainFury

Superfast Matrix-Multiplication-Free LLMs Are Finally Here

Computer Vision : A PyTorch Model Trained on the Stanford Cars Dataset

TensorFlow Disappoints – Google Deep Learning falls shallow

Pattern recognition and machine learning with live example

How Can Data Scientists Leverage The Power of GPU Jupyter Notebooks To Accelerate Deep Learning Tasks?

领英推荐

Home Lab

2024年1月19日

社区洞察

其他会员也浏览了

January-February Bits & Bytes in AI/ML

Why Choosing TensorFlow Lite?

7 Best Laptops For Deep Learning and Data Science in 2020

What is Artificial Intelligence & Machine Learning ?

AI Innovations Galore: PyTorch's Hidet Compiler, Google's DeepMind Merge, and NimbleBox's ChainFury

Superfast Matrix-Multiplication-Free LLMs Are Finally Here

Computer Vision : A PyTorch Model Trained on the Stanford Cars Dataset

TensorFlow Disappoints – Google Deep Learning falls shallow

Pattern recognition and machine learning with live example

How Can Data Scientists Leverage The Power of GPU Jupyter Notebooks To Accelerate Deep Learning Tasks?