Running ML inference with AMD GPU and ROCm (Part II)
Author:?Arthur Shaikhatarov, Lead AI/ML Specialist, Luxoft Serbia
This is the second article that continues our series dedicated to running ML algorithms and neural networks on AMD GPUs. In the previous part of this series, we explained our ML project and covered the stack of neural networks and computer vision approaches. In this part, we will go on and describe the process of launching our project on AMD GPUs.
As we already mentioned, collaborating with AMD we have access to various and various AMD video cards that have numerous significant advantages in floating-point calculations. That is one of the reasons why we decided to run our ML models with AMD. To accelerate compute-intensive operations on GPUs, AMD offer their own ROCm open software platform, which is supported by the major ML frameworks, such as TensorFlow and PyTorch. The models were built with PyTorch, so we managed to run their code with practically no additional porting work.
System configuration
Our test environment included one PC with the following configuration:
1. Installing required libraries
In the first place, we installed the ROCm v5.2 library following the official instructions:
sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/22.20/ubuntu/focal/amdgpu-install_22.20.50200-1_all.deb
sudo apt-get install ./amdgpu-install_22.20.50200–1_all.deb
sudo apt-get update
sudo amdgpu-install - usecase=rocm
sudo reboot
Next, we installed PyTorch as described in their documentation.
We planned to deploy everything in a Docker container. So, we chose a Docker with pre-installed ROCm libraries and the required version of PyTorch from the official ROCm Docker Hub.
To start the Docker, we went through these steps:
1. Got the image with required torch=1.11 version from the ROCm Docker hub:
docker pull rocm/pytorch:rocm5.2_ubuntu20.04_py3.7_pytorch_1.11.0
Unlike NVIDIA and CUDA, ROCm did not require us to install additional plug-ins for Docker.
2. After confirming the correctness of the Docker image, we created a custom dockerfile inside the project:
FROM rocm/pytorch:rocm5.2_ubuntu20.04_py3.7_pytorch_1.11.0?
ARG APP_DIR=/app?
WORKDIR "$APP_DIR"?
COPY . $APP_DIR/
3. Next, we built and started a Docker container using following command:
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G my_project/rocm
4. Inside the Docker file, we installed the necessary libraries and downloaded the weights of the models using install.sh.
2. Testing models with AMD graphics cards
Having the necessary environment, we were able to load the models and run inference. For the most part, the process was quite straightforward and smooth. Below you can find a brief report with our test results.
2.1 Vanishing points model
The model is successfully initialized with the weights loaded. Forward pass does not raise an error.
2.2 Object removal model
领英推荐
Model source: https://github.com/saic-mdal/lama
Details: The model uses default torch layers.
Status: [Success]
The model is successfully initialized with the weights loaded. Forward pass does not raise an error.
2.3 Room layout model
Model source: https://github.com/leVirve/lsun-room
Details: The model uses default torch layers.
Status: [Success]
The model is successfully initialized with the weights loaded. Forward pass does not raise an error.
2.4 Segmentation model
Model source: https://github.com/open-mmlab/mmsegmentation [Success]
Details: We used the SWIN transformer architecture, which showed the best results on indoor segmentation. This model is based on the mmcv framework (3.7k stars) from open-mmlab.
The major mmcv features:
This model was the hardest task of all because of mmcv.
There are 2 known ways to install mmcv:
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 566, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1418, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1747, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Since this was the only option left, we decided to use a TorchScript on a PC with CUDA libraries. TorchScript is a good solution for creating serializable and optimizable models from PyTorch code. Any TorchScript can be saved from a Python process and then loaded in a process where there is no Python dependency.
So, we ended up with an approach that would allow us to port the model code from one PC to another. First, we initialized the model on a PC with an NVIDIA GPU. Next, we generated the model code using torch.jit.trace. And after that, we uploaded the code to the PC with ROCm, so there was no need to install mmcv.
As a result of our tests, the model was successfully initialized. Forward pass did not raise an error and was the same as on a PC with CUDA.
Summary
Our research proved that ROCm library covers all the needs for running ML models on AMD GPUs. The standard and custom layers of the PyTorch framework work with no additional changes, which allows experimenting with models and deploying them.
Of cause, there’s no single solution for all cases, as each project and each model has its own merits. Even though the mmcv framework does not yet include support for ROCm and we faced some difficulties with it, there’re still ways to successfully circumvent these limitations and get the models running.
Feedback
As part of the AI community, we are always open for discussion. If you find our experience useful, please be free to share your thoughts. Any and all feedback and contribution are welcome.
You can also visit our website