登录查看更多内容

Running ML inference with AMD GPU and ROCm (Part II)

Luxoft Serbia

发布日期: 2023年2月22日

Author:?Arthur Shaikhatarov, Lead AI/ML Specialist, Luxoft Serbia

This is the second article that continues our series dedicated to running ML algorithms and neural networks on AMD GPUs. In the previous part of this series, we explained our ML project and covered the stack of neural networks and computer vision approaches. In this part, we will go on and describe the process of launching our project on AMD GPUs.

As we already mentioned, collaborating with AMD we have access to various and various AMD video cards that have numerous significant advantages in floating-point calculations. That is one of the reasons why we decided to run our ML models with AMD. To accelerate compute-intensive operations on GPUs, AMD offer their own ROCm open software platform, which is supported by the major ML frameworks, such as TensorFlow and PyTorch. The models were built with PyTorch, so we managed to run their code with practically no additional porting work.

System configuration

Our test environment included one PC with the following configuration:

AMD Ryzen 7
Vega 20 [Radeon VII]
Ubuntu 20.04

1. Installing required libraries

In the first place, we installed the ROCm v5.2 library following the official instructions:

sudo apt-get update

wget https://repo.radeon.com/amdgpu-install/22.20/ubuntu/focal/amdgpu-install_22.20.50200-1_all.deb

sudo apt-get install ./amdgpu-install_22.20.50200–1_all.deb

sudo apt-get update

sudo amdgpu-install - usecase=rocm

sudo reboot

Next, we installed PyTorch as described in their documentation.

We planned to deploy everything in a Docker container. So, we chose a Docker with pre-installed ROCm libraries and the required version of PyTorch from the official ROCm Docker Hub.

To start the Docker, we went through these steps:

1. Got the image with required torch=1.11 version from the ROCm Docker hub:

docker pull rocm/pytorch:rocm5.2_ubuntu20.04_py3.7_pytorch_1.11.0

Unlike NVIDIA and CUDA, ROCm did not require us to install additional plug-ins for Docker.

2. After confirming the correctness of the Docker image, we created a custom dockerfile inside the project:

FROM rocm/pytorch:rocm5.2_ubuntu20.04_py3.7_pytorch_1.11.0?

ARG APP_DIR=/app?

WORKDIR "$APP_DIR"?

COPY . $APP_DIR/

3. Next, we built and started a Docker container using following command:

sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G my_project/rocm

4. Inside the Docker file, we installed the necessary libraries and downloaded the weights of the models using install.sh.

2. Testing models with AMD graphics cards

Having the necessary environment, we were able to load the models and run inference. For the most part, the process was quite straightforward and smooth. Below you can find a brief report with our test results.

2.1 Vanishing points model

Model source: https://github.com/zhou13/neurvps
Details: The model uses a custom Deformable Convolution layer loaded via torch.utils.cpp_extension.
Status: [Success]

The model is successfully initialized with the weights loaded. Forward pass does not raise an error.

2.2 Object removal model

领英推荐

Latest Updates: 36K NVIDIA GB200 GPU Cluster, New FLUX…

Together AI 3 个月前

Accelerating Generative AI: NVIDIA's CUDA Reinvents HPC

NEBUL | European Private AI 1 年前

Unleashing Apple Silicon's Machine Learning Prowess: A…

Bojan Tunguz, Ph.D. 1 个月前

Model source: https://github.com/saic-mdal/lama

Details: The model uses default torch layers.

Status: [Success]

The model is successfully initialized with the weights loaded. Forward pass does not raise an error.

2.3 Room layout model

Model source: https://github.com/leVirve/lsun-room

Details: The model uses default torch layers.

Status: [Success]

The model is successfully initialized with the weights loaded. Forward pass does not raise an error.

2.4 Segmentation model

Model source: https://github.com/open-mmlab/mmsegmentation [Success]

Details: We used the SWIN transformer architecture, which showed the best results on indoor segmentation. This model is based on the mmcv framework (3.7k stars) from open-mmlab.

The major mmcv features:

Various backbones and pretrained models
Bag of training tricks
Large-scale training configs
High efficiency and extensibility
Powerful toolkits
Status: [Success]

This model was the hardest task of all because of mmcv.

There are 2 known ways to install mmcv:

Building mmcv from source. This approach caused an error:

File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 566, in unix_wrap_ninja_compile

with_cuda=with_cuda)

File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1418, in _write_ninja_file_and_compile_objects

error_prefix='Error compiling objects for extension')

File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1747, in _run_ninja_build

raise RuntimeError(message) from e

RuntimeError: Error compiling objects for extension

Installing pre-built mmcv packages built for a specified CUDA version or CPU only.

Since this was the only option left, we decided to use a TorchScript on a PC with CUDA libraries. TorchScript is a good solution for creating serializable and optimizable models from PyTorch code. Any TorchScript can be saved from a Python process and then loaded in a process where there is no Python dependency.

So, we ended up with an approach that would allow us to port the model code from one PC to another. First, we initialized the model on a PC with an NVIDIA GPU. Next, we generated the model code using torch.jit.trace. And after that, we uploaded the code to the PC with ROCm, so there was no need to install mmcv.

As a result of our tests, the model was successfully initialized. Forward pass did not raise an error and was the same as on a PC with CUDA.

Summary

Our research proved that ROCm library covers all the needs for running ML models on AMD GPUs. The standard and custom layers of the PyTorch framework work with no additional changes, which allows experimenting with models and deploying them.

Of cause, there’s no single solution for all cases, as each project and each model has its own merits. Even though the mmcv framework does not yet include support for ROCm and we faced some difficulties with it, there’re still ways to successfully circumvent these limitations and get the models running.

Feedback

As part of the AI community, we are always open for discussion. If you find our experience useful, please be free to share your thoughts. Any and all feedback and contribution are welcome.

You can also visit our website

要查看或添加评论，请登录

Luxoft Serbia的更多文章

See all articles

Running ML inference with AMD GPU and ROCm (Part II)

Luxoft Serbia

System configuration

1. Installing required libraries

2. Testing models with AMD graphics cards

领英推荐

Summary

Feedback

Luxoft Serbia的更多文章

社区洞察

其他会员也浏览了

High-Speed Optical Module Demand Soars: AI Computing and Market Projections Drive Innovations

Stream MultiProcessors in GPU

AI Hardware: CPU vs GPU vs NPU

Understanding Jensen Huang’s Computex 2024 KeyNote Speech and its Impact on Jobs

Tech News: Huawei Ascend 910C Achieves 60% of NVIDIA H100's Performance in DeepSeek Tasks

What is the GPGPU, the King of AI Computing Chips?

What is the difference between GPU and CPU in AI and Machine Learning?

Introduction To GPUs

Breaking Nvidia’s AI Chip Dominance: How New Players Are Disrupting the Market

Google Coral Edge TPU Vs NVIDIA Jetson Nano.

System configuration

1. Installing required libraries

2. Testing models with AMD graphics cards

领英推荐

Summary

Feedback

Luxoft Serbia的更多文章

Tales from the Sprint — our new podcast series

MPS: learning metaprogramming using Arduino

#LuxoftSerbia preporuke povodom Dana ljubitelja knjige

Mastering MLOps practices for a trading bot

Usavr?i svoje #Java znanje uz preporuke na?ih stru?njaka

Verujte u sebe i ne odustajte! – poru?uju #LuxoftSerbia in?enjerke povodom Me?unarodnog dana ?ena u in?enjerstvu

How Luxoft helps people with mental health issues quit smoking

Luxoft C++ tim ti preporu?uje knjige

Test-Driven-Development in C++

Luxoft C++ tim ti preporu?uje knjige

社区洞察

其他会员也浏览了

High-Speed Optical Module Demand Soars: AI Computing and Market Projections Drive Innovations

Stream MultiProcessors in GPU

AI Hardware: CPU vs GPU vs NPU

Understanding Jensen Huang’s Computex 2024 KeyNote Speech and its Impact on Jobs

Tech News: Huawei Ascend 910C Achieves 60% of NVIDIA H100's Performance in DeepSeek Tasks

What is the GPGPU, the King of AI Computing Chips?

What is the difference between GPU and CPU in AI and Machine Learning?

Introduction To GPUs

Breaking Nvidia’s AI Chip Dominance: How New Players Are Disrupting the Market

Google Coral Edge TPU Vs NVIDIA Jetson Nano.