Building TensorFlow v2.16 with GPUs from Source.
Patrick Hamilton
CTO Internet 2.0 | Director & Boardmember (US) | Cybersecurity & Technology Expert | Machine Learning & Neural Network Specialist | Financial Institutions & Critical Infrastructure | Solution Architect | CISSP ?
I wrote previously of an article for how to build TensorFlow v2.16 to work with older CPUs (Central Processing Unit) as a precursor for this article of using GPUs (Graphics Processing Unit). With TensorFlow, as opposed to the other popular Machine Learning (ML) Python program, Pytorch; the latest binaries versions will not work with CPUs lacking the support of AVX or AVX2 extensions (don't worry if you do not know what AVX is about). However, the power in processing with ML lies with GPUs and not the CPU.
If for example, having a system with an outdated CPU, but the video card(s) with the GPU(s) are still current, there are two choices; either buy a new CPU, which is likely to require buying a new motherboard, RAM and perhaps a power supply (and then what to do with that old hardware...), or use the existing hardware and compile TensorFlow to work with it, while to utilize the GPUs. In this article is to forgo buying new hardware just for the sakes of having TensorFlow from the binaries to work.
Actually there a couple of more choices, can resort to using a cloud server and pay for it use or use Pytorch instead.
A side note here, TensorFlow v2.15 cannot be compiled from source due to the problem of versioning of Nvidia CUDA v12.2 and dependencies. Version 2.15 is not required to use Nvidia's TensorRT, but unfortunately of how CLang (a compiler) is setup, requires TensorRT to be installed. However, TensorRT can only use CUDA 12.1... and just that .1 version, well makes difference and breaks the ability to compile. While an update patch has been made available for TensorRT to no longer be required, this is only for the binaries and has not been applied for the available open source code. If that is confusing, or wondering what TensorRT is about, then breath and let that go, that is not the point, the point is...
BUILDING TENSORFLOW v2.16 WITH GPU SUPPORT FROM SOURCE:
The following instructions are Step-by-Step commands in Linux to build out Tensorflow v2.16. The initial setup of the operating system is to use a clean Ubuntu Desktop v22.04 with the Minimum Installation selected and not to install 3rd party applications.
After installing Ubuntu Desktop and logging in the first time...
PERFORM UPDATES:
1.0.????? After installing Ubuntu, go to the Software Updater:
1.1.????? Set Ubuntu Software > Download from: Main Server
1.2.????? Click on Close
1.3.????? At the popup, click on Reload
2.0.????? Open Software Updater (again as it closed):
2.1.????? Go to Additional Drivers tab
2.2.????? Select: Using NVIDIA driver metapackage from nvidia-driver-535 (proprietary,tested)
2.3.????? Click on Apply Changes
2.4.????? Click on Close
3.0.????? Open Software Updater (as it did close again):
3.1.????? Wait for window popup to perform updates, or click on the Notification error due to Ubuntu Pro not loading, then click on Show Updates
3.2.????? At the Software Updater, click on Install Now
3.3.????? Click on Restart Now
4.0.????? After the Reboot and Logged back in, Open a Terminal
5.0.????? To check for Nvidia Cards, type:
lspci | grep -i nvidia
5.1.????? To check the Nvidia Version, type:
nvidia-smi | grep "Driver Version" | awk '{print $6}' | cut -c1-
5.2.????? Or just,
nvidia-smi
And example screenshot of the results:
SETUP ENVIRONMENT AND PYTHON:
1.0.????? To check the current version of Python (should be v3.10), type:
python3 -V
2.0.????? Add the repositories:
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo add-apt-repository ppa:deadsnakes/ppa
3.0.????? Update the system:
sudo apt update -y && sudo apt upgrade -y
4.0.????? Install systems:
sudo apt install git python-is-python3 python3-pip python3-dev patchelf -y
5.0.????? To Verify the installation (should be at a higher version, such as v3.10.12):
python -V
6.0.????? Set Path now, type:
sudo nano ~/.bashrc
6.1.????? Add at the end of the file, add the following:
export PATH="$PATH:/home/sysop/.local/bin"
6.2.????? Save and exit (CTRL-O, CTRL-X)
Example of adding in the Path towards the end, disregard that line with Bazel, that comes later:
7.0.????? To apply changes:
source ~/.bashrc
8.0 Install Python v3.11.7
8.1 To verify Python 3.11 is available:
apt list | grep python3.11
8.2 To install all Python 3.11 modules:
sudo apt install python3.11-full -y
8.3 Verify Install:
python3.11 -V
Result would be: Python 3.11.7
8.4 Set Alternative Versions for Python3:
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 2
8.5 Python v3.10 is automatically set to Auto Mode, but can be edit as Manual:
sudo update-alternatives --config python3
8.6 Select v3.10. Python v3.10 is to be used to compile TensorFlow v2.16. This makes life easier. But understand that Python v3.11 is required to run TensorFlow v2.16. If to try to compile TensorFlow v2.16 with Python v3.11, as in South Park, "You may have a hard time."
INSTALL NVIDIA CUDA TOOLKIT:
1.0 Download CUDA v12.2:
1.1 Open a web browser and go to: https://developer.nvidia.com/cuda-downloads
1.2 Scroll to bottom, click on "Archive of Previous CUDA Releases"
1.3 Click on: CUDA Toolkit 12.2.0
1.4 Select the following:
Operating System: Linux
Architecture: x86_64
Distribution: Ubuntu
Version: 22.04
Installer Type: deb (local)
1.5 A set of instructions will appear, which are listed in Steps 1.7 and 1.8 below.
1.6 At the Ubuntu Terminal, go to the Downloads folder such as, type:
cd Downloads
1.7 Perform the following commands to download:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb
1.8 Install the CUDA Toolkit:
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update -y && sudo apt upgrade -y
sudo apt install cuda -y
sudo apt install nvidia-gds -y
sudo reboot
1.9 After the reboot, log back in.
1.10 Open Terminal
1.11 Add Persistence:
export PATH=/usr/local/cuda-12.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
source ~/.bashrc
systemctl status nvidia-persistenced
1.12 Check Versions:
nvidia-smi
OR
nvcc --version
If nvcc check fails then:
sudo apt install nvidia-cuda-toolkit -y
sudo reboot
INSTALL NVIDIA cuDNN v8.9.0:
1.0 Downloading Nvidia cuDNN requires registration and login.
1.1 Open a web browser and go to: https://developer.nvidia.com/cudnn
1.2 Click on: Download cuDNN Library
1.3 Complete the registration process (if not one before).
1.4 Log into the website.
1.5 Checkbox "I Agree To the Terms of the cuDNN Software License Agreement"
1.6 Click on the Archived cuDNN Releases link
1.7 Click on "Download cuDNN v8.9.0 (April 11th, 2023), for CUDA 12.x"
1.8 Click on "Local Installer for Ubuntu22.04 x86_64 (Deb)"
1.9 Once the download is complete, at the terminal, go to the Downloads folder, such as type,
cd Downloads
2.0 Then begin the installation of cuDNN and its dependencies:
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.0.131_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-*/cudnn-local-*-keyring.gpg /usr/share/keyrings/
sudo apt update -y
sudo apt install libcudnn8=8.9.0.131-1+cuda12.1
sudo apt install libcudnn8-samples=8.9.0.131-1+cuda12.1
sudo apt install make libfreeimage3 libfreeimage-dev
NOTE: The install lines having cuda12.1 are intentional as there is no support with cuda12.2. This is acceptable.
3.0 This is optional to compile the samples to perform a test. These 3.0 steps of can be disregarded, and just skip to Step 4.
cd /usr/src/cudnn_samples_v8/
cp -r /usr/src/cudnn_samples_v8/ $HOME
cd $HOME/cudnn_samples_v8/mnistCUDNN
sudo make clean && sudo make
sudo reboot
3.1 After rebooting and logging back in, open a terminal.
3.2 Execute the compiled samples for the test:
cd $HOME/cudnn_samples_v8/mnistCUDNN
./mnistCUDNN
3.3 The result should be - View Results: Test passed!
4.0 Set Paths for Cuda:
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
5.0 Verify Nvidia Settings
领英推荐
nvidia-smi
INSTALL BAZELISK:
TensorFlow requires the use of Bazel, which is a powerful build tool, like a super-smart recipe book for software, that helps organize and compile large code bases efficiently. Bazelisk is a helpful companion that automatically manages Bazel versions, ensuring you always use the right 'recipe' for projects like TensorFlow, without getting into technical hassles.
1.0.????? At the terminal, go to Downloads, such as:
cd Downloads
2.0.????? Download and copy Bazelisk:
wget https://github.com/bazelbuild/bazelisk/releases/download/v1.19.0/bazelisk-linux-amd64
chmod +x bazelisk-linux-amd64
sudo mv bazelisk-linux-amd64 /usr/local/bin/bazel
3.1.????? Set a Path for Bazelisk:
sudo nano ~/.bashrc
3.2.????? Add at the end of the file, similar as before:
export PATH=/usr/local/bin/bazel:$PATH
3.3.????? Save and exit (CTRL-O, CTRL-X)
4.0.????? To apply changes:
source ~/.bashrc
OPTIONAL: INSTALL CLANG v16
According to TensorFlow, CLang is required for use to compile from source. However, as of this time, CLang (versions v16 and v17) have not worked for TensorFlow version 2.15 and 2.16. I have only provided the instructions here as a reference. A side note, CLang will compile TensorFlow v2.15 and v2.16 but for CPU use only.
1.0.????? Download Clang Install Script:
wget https://apt.llvm.org/llvm.sh
2.0.????? Make the script executable:
chmod +x llvm.sh
3.0.????? Execute the script:
sudo ./llvm.sh 16
THE ENVIRONMENT IS SET TO:
Ubuntu v22.04.3 LTS (Minimal Installation)
- Python3 v3.10.12
- Python3.11 v3.11.7
- Nvidia v535.129.03
- Cuda v12.2
- GCC v11.4.0
- CLang v16.06
- Bazel Bazelisk
- cuDNN: v8.9.0
PREPARE FOR TENSORFLOW:
1.0.????? Install the dependencies:
pip install -U --user pip numpy wheel packaging requests opt_einsum
pip install -U --user keras_preprocessing --no-deps
***** NOW FOR A TRICK *****
Even though those dependencies are to be installed, well, apparently they are not all of the dependencies required or may not be the right version. So, if you skip this step, the compile is likely to fail. The trick here is to install TensorFlow v2.15.0 update...
But we are building our own TensorFlow v2.16, so why this installation, with depending on your CPU may not work? The reason is to automatically have the environment to install the correct dependencies and then we will uninstall TensorFlow v2.15 later. The dependencies will remain.
Run the following command:
pip install tensorflow==2.15.0 --upgrade
DOWNLOAD TENSORFLOW REPOSITORY:
1.0.????? Now we will need to pull the TensorFlow Repository:
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
2.0.????? Normally the next step would be to “checkout” a version of TensorFlow, but we are using the current version (v2.16) as v2.15 cannot be compiled.
CHECK FOR CPU FLAGS:
1.0 This is the part to help optimize TensorFlow for your CPU, type:
grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
The result should be something like this, which is what will be used:
-march=native -mssse3 -mcx16 -msse4.1 -msse4.2 -mpopcnt
Generally, the "-march=native" is sufficient to use, but why cut yourself short.
Or another method is to use the CPU Family, such as with this command:
cat /sys/devices/cpu/caps/pmu_name
Result would be something, such as Nehalem:
-march=Nehalem
CONFIGURE TENSORFLOW v2.16 for GPU:
1.0.????? Type the following command to configure for the build:
./configure
2.0.????? For the list of questions, use the following:
Python Location: Default
Python Library: Default
Tensorflow with ROCm: N
Tensorflow with CUDA: Y
Tensorflow with TensorRT: N
CUDA Capabilities: Default
CLang as Compiler: N
GCC Path: Default
Optimization Flags: -march=native -mssse3 -mcx16 -msse4.1 -msse4.2 -mpopcnt -Wno-gnu-offsetof-extensions
Android Builds: N
NOTES:
BUT WAIT, WE NEED TO FIX THE BAZEL CONFIGURATION FILE!
Unfortunately there is a bug with the Bazel configuration file in which there is a duplicate of "-Wno-gnu-offsetof-extensions", so this needs to be deleted:
sudo nano .tf_configure.bazelrc
Scroll down to the first line entry of "-Wno-gnu-offsetof-extensions" and delete it. In the picture below, it is the highlighted line. Once deleted, save and edit from the Nano editor.
COMPILE TENSORFLOW v2.16 for GPU:
1.0 Part 1 - Build the package-builder.
1.1 Set the Python environment to use, type:
export TF_PYTHON_VERSION=3.10
1.2 To help optimize for compiling with your system, you can determine the number of CPU Cores to use. You can use less, but recommended to use all as this will take several hours to complete. Run the following command:
nproc
1.3 The result is the number of available processors, in this case was 8 for the system I was using.
1.4 Run the following command to build the package builder. The number 8 for --jobs=8 is the number of processors to use. 8 is the example here. Your system may have more or less processors and therefore use the value that was displayed from running the "nproc" command above instead.
sudo bazel build --config=opt --jobs=8 //tensorflow/tools/pip_package:build_pip_package
After those many hours, the result should be as follows:
2.0 Part 2 - Build the package:
2.1 Fortunately this process does not take as long and can be performed by typing the following:
sudo ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
2.2 The result should be as follows:
INSTALL TENSORFLOW v2.16 for GPU:
1.0 Remove the existing Tensorflow v2.15 first. Be sure to change out of the current directory as an error may occur.
cd ..
pip uninstall tensorflow
1.1 Install Tensorflow v2.16, by using Python v3.11 and the -m parameter as shown below. I added --force-reinstall as a precautionary method:
python3.11 -m pip install /tmp/tensorflow_pkg/tensorflow*.whl --force-reinstall
1.2 You can copy the TensorFlow whl file as well for future use and to avoid another round of compiling.
TEST TENSORFLOW:
1.0 At the terminal, type:
python3.11
import tensorflow as tf
print("Number of GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
The results may be similar to below:
Now in this case, there appears to be errors of, "successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero." These are warnings and not errors. There is a means to correct this, by using the following script. The script searches for devices with Nvidia (the video cards) and sets a value to 0.
First, create the script, it will be named as "numa_node.start" for example. Can use Nano from the command line or Text Editor.
If to use Nano:
sudo nano numa_node.start
Add in the following lines:
#!/bin/bash
for pcidev in $(lspci -D|grep 'VGA compatible controller: NVIDIA'|sed -e 's/[[:space:]].*//'); do echo 0 > /sys/bus/pci/devices/${pcidev}/numa_node; done
Save and exit.
Secondly, make the script executable:
chmod +x numa_node.start
Third, test the script by running it:
sudo ./numa_node.start
Next, then run the test again, the results should no longer present the warnings.
Now to make the script work permanently, copy the script to /etc/local.d:
sudo cp numa_node.start /etc/local.d
Fourth, run the script as a Cron Job during bootup:
sudo crontab -e
Add at the end:
@reboot /etc/local.d/numa_node.start
Bonus: Now if interested in running another test, you can use this test python script.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import tensorflow as tf
# Check if TensorFlow was built with CUDA (GPU support)
print("Built with GPU support:", tf.test.is_built_with_cuda())
# List of available GPUs
gpus = tf.config.list_physical_devices('GPU')
print("GPUs available:", gpus)
# Additional test to check GPU utilization
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
with tf.device('/GPU:0'): # Specifies that the operation runs on the first GPU.
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
Conclusion
And there you have it, the procedures to build from source code for TensorFlow v2.16 with GPU support. This is a rather complex process, and unfortunately can lead to many hours of frustration due to the official documentation not being up-to-date and missing some details (such as, how to install Bazel and CLang properly).
Now compiling Pytorch is not easy either, but fortunately you can just install it for use with older CPUs.
And lastly, these sets of steps work currently. TensorFlow tends to be finicky and sensitive, especially with versions of its dependencies. If Version 2.16 does not compile, there is likely a typo, a missed dependency to install, or a step. But if errors kept occurring, or perhaps v2.16 is a bug that prevents your code from running properly, then the recommendation is to use the last known good version of v2.14, but you then have to use CUDA v11.8.
???????? ??????, ??????? ????????, ???? ???????? ,2x ?????? ????? ??????
7 个月After 13 day , finally i can use gpu with your help , thanks alot ??????????????????
This guide is a game changer for anyone dealing with TensorFlow compatibility issues. ????
Great job on covering a complex topic! Looking forward to reading your guide. ????