Distributed Deep Learning with Horovod Training Course

Distributed Deep Learning with Horovod Training Course

Horovod is an open source software framework, designed for processing fast and efficient distributed deep learning models using TensorFlow, Keras, PyTorch, and Apache MXNet. It can scale up a single-GPU training script to run on multiple GPUs or hosts with minimal code changes.

This course is aimed at developers or data scientists who wish to use Horovod to run distributed deep learning trainings and scale it up to run across multiple GPUs in parallel.


Course Outline

Introduction

  • Overview of Horovod features and concepts
  • Understanding the supported frameworks

Installing and Configuring Horovod

  • Preparing the hosting environment?? ?
  • Building Horovod for TensorFlow, Keras, PyTorch, and Apache MXNet
  • Running Horovod

Running Distributed Training

  • Modifying and running training examples with TensorFlow
  • Modifying and running training examples with Keras
  • Modifying and running training examples with PyTorch
  • Modifying and running training examples with Apache MXNet

Optimizing Distributed Training Processes

  • Running concurrent operations on multiple GPUs?? ?
  • Tuning hyperparameters
  • Enabling performance autotuning

Troubleshooting

Summary and Conclusion



Contact us

email - [email protected]




要查看或添加评论,请登录

Blue Chip Training and Consulting的更多文章

社区洞察

其他会员也浏览了