Running ML inference with AMD GPU and ROCm (Part II)
Running ML inference with AMD GPU and ROCm

Running ML inference with AMD GPU and ROCm (Part II)

Author:?Arthur Shaikhatarov, Lead AI/ML Specialist, Luxoft Serbia

This is the second article that continues our series dedicated to running ML algorithms and neural networks on AMD GPUs. In the previous part of this series, we explained our ML project and covered the stack of neural networks and computer vision approaches. In this part, we will go on and describe the process of launching our project on AMD GPUs.

As we already mentioned, collaborating with AMD we have access to various and various AMD video cards that have numerous significant advantages in floating-point calculations. That is one of the reasons why we decided to run our ML models with AMD. To accelerate compute-intensive operations on GPUs, AMD offer their own ROCm open software platform, which is supported by the major ML frameworks, such as TensorFlow and PyTorch. The models were built with PyTorch, so we managed to run their code with practically no additional porting work.

System configuration

Our test environment included one PC with the following configuration:

  • AMD Ryzen 7
  • Vega 20 [Radeon VII]
  • Ubuntu 20.04

1. Installing required libraries

In the first place, we installed the ROCm v5.2 library following the official instructions:

sudo apt-get update

wget https://repo.radeon.com/amdgpu-install/22.20/ubuntu/focal/amdgpu-install_22.20.50200-1_all.deb

sudo apt-get install ./amdgpu-install_22.20.50200–1_all.deb

sudo apt-get update

sudo amdgpu-install - usecase=rocm

sudo reboot


Next, we installed PyTorch as described in their documentation.

We planned to deploy everything in a Docker container. So, we chose a Docker with pre-installed ROCm libraries and the required version of PyTorch from the official ROCm Docker Hub.

To start the Docker, we went through these steps:

1. Got the image with required torch=1.11 version from the ROCm Docker hub:

docker pull rocm/pytorch:rocm5.2_ubuntu20.04_py3.7_pytorch_1.11.0


Unlike NVIDIA and CUDA, ROCm did not require us to install additional plug-ins for Docker.

2. After confirming the correctness of the Docker image, we created a custom dockerfile inside the project:

FROM rocm/pytorch:rocm5.2_ubuntu20.04_py3.7_pytorch_1.11.0?

ARG APP_DIR=/app?

WORKDIR "$APP_DIR"?

COPY . $APP_DIR/


3. Next, we built and started a Docker container using following command:

sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G my_project/rocm


4. Inside the Docker file, we installed the necessary libraries and downloaded the weights of the models using install.sh.

2. Testing models with AMD graphics cards

Having the necessary environment, we were able to load the models and run inference. For the most part, the process was quite straightforward and smooth. Below you can find a brief report with our test results.


2.1 Vanishing points model

  • Model source: https://github.com/zhou13/neurvps
  • Details: The model uses a custom Deformable Convolution layer loaded via torch.utils.cpp_extension.
  • Status: [Success]

The model is successfully initialized with the weights loaded. Forward pass does not raise an error.


2.2 Object removal model

Model source: https://github.com/saic-mdal/lama

Details: The model uses default torch layers.

Status: [Success]

The model is successfully initialized with the weights loaded. Forward pass does not raise an error.


2.3 Room layout model

Model source: https://github.com/leVirve/lsun-room

Details: The model uses default torch layers.

Status: [Success]

The model is successfully initialized with the weights loaded. Forward pass does not raise an error.


2.4 Segmentation model

Model source: https://github.com/open-mmlab/mmsegmentation [Success]

Details: We used the SWIN transformer architecture, which showed the best results on indoor segmentation. This model is based on the mmcv framework (3.7k stars) from open-mmlab.


The major mmcv features:

  • Various backbones and pretrained models
  • Bag of training tricks
  • Large-scale training configs
  • High efficiency and extensibility
  • Powerful toolkits
  • Status: [Success]

This model was the hardest task of all because of mmcv.


There are 2 known ways to install mmcv:

  • Building mmcv from source. This approach caused an error:

File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 566, in unix_wrap_ninja_compile

with_cuda=with_cuda)

File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1418, in _write_ninja_file_and_compile_objects

error_prefix='Error compiling objects for extension')

File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1747, in _run_ninja_build

raise RuntimeError(message) from e

RuntimeError: Error compiling objects for extension


  • Installing pre-built mmcv packages built for a specified CUDA version or CPU only.

Since this was the only option left, we decided to use a TorchScript on a PC with CUDA libraries. TorchScript is a good solution for creating serializable and optimizable models from PyTorch code. Any TorchScript can be saved from a Python process and then loaded in a process where there is no Python dependency.

So, we ended up with an approach that would allow us to port the model code from one PC to another. First, we initialized the model on a PC with an NVIDIA GPU. Next, we generated the model code using torch.jit.trace. And after that, we uploaded the code to the PC with ROCm, so there was no need to install mmcv.

As a result of our tests, the model was successfully initialized. Forward pass did not raise an error and was the same as on a PC with CUDA.

Summary

Our research proved that ROCm library covers all the needs for running ML models on AMD GPUs. The standard and custom layers of the PyTorch framework work with no additional changes, which allows experimenting with models and deploying them.

Of cause, there’s no single solution for all cases, as each project and each model has its own merits. Even though the mmcv framework does not yet include support for ROCm and we faced some difficulties with it, there’re still ways to successfully circumvent these limitations and get the models running.

Feedback

As part of the AI community, we are always open for discussion. If you find our experience useful, please be free to share your thoughts. Any and all feedback and contribution are welcome.

You can also visit our website

要查看或添加评论,请登录

Luxoft Serbia的更多文章

社区洞察

其他会员也浏览了