登录查看更多内容

When to use TPUs on ML Workloads

Guillermo Perasso

Senior Director, Training at Jellyfish - Google Cloud Certified Professional Google Cloud trainer of the year 2023 North America

发布日期: 2024年2月12日

This is the first article in a series on taking the Google Cloud Professional Machine Learning Certification exam. This text is intended to aid those who are preparing for the certification by presenting some key topics that may be included in the 50 questions on the exam.

Today, we will look into tensor processing units (TPUs). I will provide a quick overview of this technology created by Google, and then we will provide a brief analysis of the ML use cases in which TPUs would yield a better solution in terms of performance and potencially cost. We will focus primarily on the training stage of large deep learning models developed using neural network algorithms.

We present some hints you will need to pick up in the exam question to determine whether the option that presents TPU would be the best choice or we need to look for alternatives, as CPU or GPU may better address the requirements described (or, most likely, implied) in the exam question.?

This text is mostly based on content available here?

Introduction to TPUs:

TPUs (Tensor Processing Units) are specialised hardware accelerators designed by Google for accelerating machine learning workloads. They are designed to deliver high performance and energy efficiency for training and deploying ML models. TPUs excel at matrix operations, making them ideal for deep learning tasks such as image recognition, natural language processing, and speech recognition.

TPUs are composed of multiple processing cores, each optimised for performing specific ML operations. These cores are interconnected through a high-speed network, allowing for efficient data transfer and communication. TPUs also feature a large on-chip memory, enabling them to store and process vast amounts of data. This architecture enables TPUs to achieve exceptional performance and throughput for ML workloads.

Programming Models for TPUs:

TPUs are supported by various programming models, including:

TensorFlow: TPUs are natively supported by TensorFlow, a widely used ML framework. (https://www.tensorflow.org/)
PyTorch: TPUs can be used with PyTorch through third-party libraries like PyTorch-XLA. (https://github.com/pytorch/xla)
JAX: It is a high-performance ML framework that supports TPU acceleration.
(https://jax.readthedocs.io/en/latest/notebooks/quickstart.html)

TPUs are ideally suited for applications that require:

Large-Scale Training: TPUs excel at training complex ML models with extensive datasets, enabling faster convergence and improved accuracy. Training processes that are expected to take several days, weeks to complete, are good candidates for TPUs.

High-Throughput Inference: TPUs are well-suited for deploying ML models in production environments, where they can handle large volumes of inference requests with low latency.

Google Cloud offers various TPU options to meet diverse user requirements:

TPU Pods are pre-configured systems that provide a complete hardware and software environment for running ML workloads on TPUs.

TPU VMs are virtual machines that include TPUs, allowing users to create custom configurations and leverage existing tooling and frameworks.

Cloud TPU is a fully managed service that provides access to TPUs without the need for hardware setup and maintenance.

Mohammad Arshad 7 个月前

Speedup Machine Learning 1000 Times

Anthony Mai 8 年前

?? AI Announcements at Build 2022 - My Personal Top 3

Xiaopeng Li 2 年前

Not always TPUs would provide benefits (cost, performance) when working with training ML models. In some situations, you might want to use GPUs or CPUs on Compute Engine instances to run your machine learning workloads. In general, you can decide what hardware is best for your workload based on the following guidelines:

Less expensive, more available CPUs may be the right choice when working with:

Quick prototyping that requires maximum flexibility
Simple models that do not take long to train
Small models with small, effective batch sizes
Models that contain many custom TensorFlow operations written in C++
Models that are limited by available I/O or the networking bandwidth of the host system

GPU (Graphics Processing Units) are suitable for:

Models with a significant number of custom TensorFlow/PyTorch/JAX operations that must run at least partially on CPUs
Models with TensorFlow ops that are not available on Cloud TPU (see the list of available TensorFlow ops)
Medium-to-large models with larger effective batch sizes

These are the scenarios where TPUs clearly provide benefits over other processors.

As mentioned above, Models that train for weeks or months and large models with large effective batch sizes are natural candidates for TPUs.
Then models dominated by matrix computations, as TPU architecture allows a massive, parallel execution of operations on matrices.?
Models with NO custom TensorFlow/PyTorch/JAX operations inside the main training loop

And finally, these are some workflows where Cloud TPUs are not expected to perform well:

Workloads that access memory in a sparse manner. TPUs are designed to very quickly (using parallel processing) swap data between the microprocessor and memory, the more data we hold in memory, the more efficient the process would be.
Neural network workloads that contain custom operations in the main training loop. This point is mentioned above, if the neural network algorithm implements customised operations, CPU or GPU may provide better performance and be more cost effective.
Workloads that require high-precision arithmetic. TPUs are designed to handle the specific requirements of neural network training and inference, where reduced precision can lead to significant performance improvements without sacrificing accuracy. By using lower precision, TPUs can perform more operations per second, leading to faster training and inference times for machine learning models. Check this article for more information.

Conclusion

As you study and prepare for the Google Cloud PMLE certification exam, you need to be able to identify relevant aspects of ML-related workloads where provisioning TPU technology could provide clear benefits in performance and cost. In general, the following factors are key:? size of training datasets, size of batches and model complexity (the larger the better); using standard TF operations (no customisations) in ML model, manipulate values with low-precision arithmetic; and last but not least, the capacity to run these process on Google infrastructure, only platform where TPU resources are available.?

For instance, here’s a question that you can find if you check the “sample exam questions” from the Google Cloud Professional Machine Learning Engineering site

sample question from the Google Cloud PMLE website

You can see clearly in this sample question how decision factors introduced in this article are at play when trying to select the right answer.?

When to use TPUs on ML Workloads

Guillermo Perasso

Senior Director, Training at Jellyfish - Google Cloud Certified Professional Google Cloud trainer of the year 2023 North America

领英推荐

Less expensive, more available CPUs may be the right choice when working with:

GPU (Graphics Processing Units) are suitable for:

These are the scenarios where TPUs clearly provide benefits over other processors.

And finally, these are some workflows where Cloud TPUs are not expected to perform well:

社区洞察

其他会员也浏览了

?? AI Announcements at Build 2022 - My Personal Top 3

Ok Google! Set up a Deep Learning Environment for Surya, Please.

Building a Machine Learning Model on the Cloud

Massive Scale-Out of Deep Learning (DL) Models for Computer Vision to Android and iOS Devices using Flutter Framework

Introducing PyTorch BigGraph: Facebook’s New Framework for Processing Large Graphs

Face Recognition

Why LLMs Hallucinate; GraphGPT; Inside Microsoft’s small LLM; Deploy Tiny Llama on AWS EC2; Fine-Tune LLM using PyTorch; and More

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

Standalone artificial intelligence technologies

Put Tensorflow, Keras and MxNet Deeplearning Models on steroids