Nvidia GPU & TensorFlow for ML in Ubuntu 24.04 LTS

Nvidia GPU & TensorFlow for ML in Ubuntu 24.04 LTS

Tensorflow announced that it would stop supporting GPUs for Windows. The latest support version was 2.10. However, with 2.11 and onwards, we will need to use Windows WSL2, a Windows subsystem for Linux. With WSL2, you can run Ubuntu or other Linux distros in Windows.

Linux fans, who obviously don't like WSL2, can use Tensorflow and Nvidia GPU by implementing the following steps.

A very important step is to know the version for Tensorflow, Python, CUDA and cuDNN. While I was testing my setup, I used the following versions:

  • Tensorflow 2.12.0
  • CUDA Toolkit 11.8
  • cuDNN SDK 8.6.0
  • Python 3.11.7
  • Nvidia GPU drivers (latest - 550.54)

Also, more information about the above versions can be found in the following URL Tensorflow supported versions

The GPU that I used was GeForce RTX 4060 Ti 16GB, but can be checked with the command:

lspci | grep -i nvidia        

which, in my case, returned the following results:

01:00.0 VGA compatible controller: NVIDIA Corporation AD106 [GeForce RTX 4060 Ti 16GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 22bd (rev a1)        

To avoid issues is better to remove all previous or old drivers:

sudo apt purge nvidia* -y
sudo apt remove nvidia-* -y
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt autoremove -y && sudo apt autoclean -y
sudo rm -rf /usr/local/cuda*        

and update/upgrade the system:

sudo apt update && sudo apt upgrade -y        

additionally will need to install some necessary packages:

sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev        

Now will need to install the latest GPU drivers:

# First get the PPA repository driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

# Find recommended driver versions for you
ubuntu-drivers devices

#List the drivers
ubuntu-drivers list

#Install the driver
ubuntu-drivers install        

At this stage will require reboot the PC, and after logging in will need to check that the GPU has been installed by using this command:

nvidia-smi        

which should return the following:

In the above image, you can see the Nvidia driver (550.54), the GPU and the running processes. During the model training, you should see the GPU usage increasing for the Python process. Now will need to install CUDA 11.8:

# Download the pin file and move it into the folder
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin

#Move to the internal folder
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

# This is one command
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub

# This is also one command
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"

# Update and upgrade
sudo apt update && sudo apt upgrade -y

 # installing CUDA-11.8
sudo apt install cuda-11-8 -y        

and set the paths:

# Add the bin folder to the bashrc file
echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc

# Add the lib64 folder to the bashrc file
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

# Execute the commands placed in the file
source ~/.bashrc

# Update the shared library cache
sudo ldconfig        

Now is time for the cuDNN 8.6, but will need to register for the Nvidia developer program by using this URL: https://developer.nvidia.com/developer-program/signup

# Add the file name as a variable
CUDNN_TAR_FILE="cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz"

# Download the file
sudo wget https://developer.download.nvidia.com/compute/redist/cudnn/v8.6.0.163/local_installers/11.8/cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz

# Unzip it
sudo tar -xvf ${CUDNN_TAR_FILE}        

and copy the files in the appropriate folders:

# Copy the files into the cuda toolkit directory
sudo cp -P cudnn-linux-x86_64-8.6.0.163_cuda11-archive/include/cudnn.h /usr/local/cuda-11.8/include/

sudo cp -P cudnn-linux-x86_64-8.6.0.163_cuda11-archive/lib/libcudnn* /usr/local/cuda-11.8/lib64/

# Change the attributes
sudo chmod a+r /usr/local/cuda-11.8/lib64/libcudnn*        

At this stage, you should be able to see the correct CUDA version:

The remaining tasks are to install Anaconda with Python, which you can find the information on this site: Anaconda for Ubuntu 24.04

The required Tensorflow version is 2.12; therefore, the following commands will help to install it (also, I had to install numpy, typing-extensions and pip):

pip install tensorflow==2.12*
pip install numpy=1.24.3
pip install typing-extensions==4.5.0        

To verify the whole procedure, you can use the following Python scripts (which can also be found in GitHub: Python for GPU Check

# Get the files
git clone https://github.com/gokul-a-krishnan/python-gpu-check

# Access the folder
cd python-gpu-check/
cd tensorflow/

# Execute the script
python3 check.py         

The important output from this script is the information for the GPU and the confirmation for the cuDNN version:

2024-05-13 17:57:09.802997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14030 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4060 Ti, pci bus id: 0000:01:00.0, compute capability: 8.9

2024-05-13 17:57:10.938643: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8600        

There are two options to monitor the GPU while you are training models:

  • Use nvidia smi and let it run every 5 seconds:

 nvidia-smi -l 5        

  • Use Mission Center but will need to install it via flathub, Mission Center

Mission Center is an amazing tool and looks like this:

Hopefully, you found this article interesting, and can help you with your ML tasks


#tensorflow #ubuntu #machinelearning #nvidia

Oybek Hodjaev

programmer at Mexmash

8 个月

Hi, thank you for this post. On "pip install tensorflow==2.12*" i get an error: "Could not find a version tha satisfies the requirement (from verwions 2.16.0rc0, 2.16.1, 2.17.0rc" no matching distribution. What can i do?

回复

要查看或添加评论,请登录

Andrew Antonopoulos的更多文章

  • Sustainable ML - Monitor Power Consumption

    Sustainable ML - Monitor Power Consumption

    Training models will also consider the power consumption of the hardware. The following paper compares the most common…

  • TensorFlow Serving API & gRPC

    TensorFlow Serving API & gRPC

    To serve models for production applications, one can use REST API or gRPC. gRPC is a high-performance, binary, and…

  • Blockchain & Web3 Technology

    Blockchain & Web3 Technology

    Blockchain is a technology that securely stores transactional information by linking blocks together in a specific…

  • NVIDIA Mixed Precision - Loss & Accuracy - Part 2

    NVIDIA Mixed Precision - Loss & Accuracy - Part 2

    Part 1 explained how Nvidia's mixed precision can help reduce power consumption. However, we also need to consider…

  • NVIDIA Mixed Precision & Power Consumption - Part 1

    NVIDIA Mixed Precision & Power Consumption - Part 1

    Deep Learning has enabled progress in many different applications and can be used for developing models for…

  • FreeBSD 13 & TCP BBR Congestion Control

    FreeBSD 13 & TCP BBR Congestion Control

    Finally TCP BBR is available for FreeBSD new release 13.x.

    2 条评论
  • Kubernetes - Open Source Tools

    Kubernetes - Open Source Tools

    Kubernetes (also known as k8s or “kube”) is a very popular container orchestration platform that automates many of the…

  • Cache-Control Headers

    Cache-Control Headers

    The performance of content that is available via web sites and applications can be significantly improved by reusing…

  • CDN Cache and Machine Learning

    CDN Cache and Machine Learning

    The majority of the Internet’s content is delivered by global caching networks, also known as Content Delivery Networks…

  • OTT & Mobile Battle in Africa

    OTT & Mobile Battle in Africa

    OTT and specially SVOD is growing in Africa. Recently big OTT providers such as Netflix, muvi, Showmax, iFlix, MTN and…

社区洞察

其他会员也浏览了