登录查看更多内容

Building TensorFlow v2.16 with GPUs from Source.

Patrick Hamilton

CTO Internet 2.0 | Director & Boardmember (US) | Cybersecurity & Technology Expert | Machine Learning & Neural Network Specialist | Financial Institutions & Critical Infrastructure | Solution Architect | CISSP ?

发布日期: 2024年1月22日

I wrote previously of an article for how to build TensorFlow v2.16 to work with older CPUs (Central Processing Unit) as a precursor for this article of using GPUs (Graphics Processing Unit). With TensorFlow, as opposed to the other popular Machine Learning (ML) Python program, Pytorch; the latest binaries versions will not work with CPUs lacking the support of AVX or AVX2 extensions (don't worry if you do not know what AVX is about). However, the power in processing with ML lies with GPUs and not the CPU.

If for example, having a system with an outdated CPU, but the video card(s) with the GPU(s) are still current, there are two choices; either buy a new CPU, which is likely to require buying a new motherboard, RAM and perhaps a power supply (and then what to do with that old hardware...), or use the existing hardware and compile TensorFlow to work with it, while to utilize the GPUs. In this article is to forgo buying new hardware just for the sakes of having TensorFlow from the binaries to work.

Actually there a couple of more choices, can resort to using a cloud server and pay for it use or use Pytorch instead.

A side note here, TensorFlow v2.15 cannot be compiled from source due to the problem of versioning of Nvidia CUDA v12.2 and dependencies. Version 2.15 is not required to use Nvidia's TensorRT, but unfortunately of how CLang (a compiler) is setup, requires TensorRT to be installed. However, TensorRT can only use CUDA 12.1... and just that .1 version, well makes difference and breaks the ability to compile. While an update patch has been made available for TensorRT to no longer be required, this is only for the binaries and has not been applied for the available open source code. If that is confusing, or wondering what TensorRT is about, then breath and let that go, that is not the point, the point is...

BUILDING TENSORFLOW v2.16 WITH GPU SUPPORT FROM SOURCE:

The following instructions are Step-by-Step commands in Linux to build out Tensorflow v2.16. The initial setup of the operating system is to use a clean Ubuntu Desktop v22.04 with the Minimum Installation selected and not to install 3rd party applications.

After installing Ubuntu Desktop and logging in the first time...

PERFORM UPDATES:

1.0.????? After installing Ubuntu, go to the Software Updater:

1.1.????? Set Ubuntu Software > Download from: Main Server

1.2.????? Click on Close

1.3.????? At the popup, click on Reload

2.0.????? Open Software Updater (again as it closed):

2.1.????? Go to Additional Drivers tab

2.2.????? Select: Using NVIDIA driver metapackage from nvidia-driver-535 (proprietary,tested)

2.3.????? Click on Apply Changes

2.4.????? Click on Close

3.0.????? Open Software Updater (as it did close again):

3.1.????? Wait for window popup to perform updates, or click on the Notification error due to Ubuntu Pro not loading, then click on Show Updates

3.2.????? At the Software Updater, click on Install Now

3.3.????? Click on Restart Now

4.0.????? After the Reboot and Logged back in, Open a Terminal

5.0.????? To check for Nvidia Cards, type:

lspci | grep -i nvidia

5.1.????? To check the Nvidia Version, type:

nvidia-smi | grep "Driver Version" | awk '{print $6}' | cut -c1-

5.2.????? Or just,

nvidia-smi

And example screenshot of the results:

SETUP ENVIRONMENT AND PYTHON:

1.0.????? To check the current version of Python (should be v3.10), type:

python3 -V

2.0.????? Add the repositories:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test

sudo add-apt-repository ppa:deadsnakes/ppa

3.0.????? Update the system:

sudo apt update -y && sudo apt upgrade -y

4.0.????? Install systems:

sudo apt install git python-is-python3 python3-pip python3-dev patchelf -y

5.0.????? To Verify the installation (should be at a higher version, such as v3.10.12):

python -V

6.0.????? Set Path now, type:

sudo nano ~/.bashrc

6.1.????? Add at the end of the file, add the following:

export PATH="$PATH:/home/sysop/.local/bin"

6.2.????? Save and exit (CTRL-O, CTRL-X)

Example of adding in the Path towards the end, disregard that line with Bazel, that comes later:

7.0.????? To apply changes:

source ~/.bashrc

8.0 Install Python v3.11.7

8.1 To verify Python 3.11 is available:

apt list | grep python3.11

8.2 To install all Python 3.11 modules:

sudo apt install python3.11-full -y

8.3 Verify Install:

python3.11 -V

Result would be: Python 3.11.7

8.4 Set Alternative Versions for Python3:

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 2

8.5 Python v3.10 is automatically set to Auto Mode, but can be edit as Manual:

sudo update-alternatives --config python3

8.6 Select v3.10. Python v3.10 is to be used to compile TensorFlow v2.16. This makes life easier. But understand that Python v3.11 is required to run TensorFlow v2.16. If to try to compile TensorFlow v2.16 with Python v3.11, as in South Park, "You may have a hard time."

INSTALL NVIDIA CUDA TOOLKIT:

1.0 Download CUDA v12.2:

1.1 Open a web browser and go to: https://developer.nvidia.com/cuda-downloads

1.2 Scroll to bottom, click on "Archive of Previous CUDA Releases"

1.3 Click on: CUDA Toolkit 12.2.0

1.4 Select the following:

Operating System:	Linux
Architecture:		x86_64
Distribution:		Ubuntu
Version:		22.04
Installer Type:		deb (local)

1.5 A set of instructions will appear, which are listed in Steps 1.7 and 1.8 below.

1.6 At the Ubuntu Terminal, go to the Downloads folder such as, type:

cd Downloads

1.7 Perform the following commands to download:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin

sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb

1.8 Install the CUDA Toolkit:

sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb

sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/

sudo apt update -y && sudo apt upgrade -y

sudo apt install cuda -y

sudo apt install nvidia-gds -y

sudo reboot

1.9 After the reboot, log back in.

1.10 Open Terminal

1.11 Add Persistence:

export PATH=/usr/local/cuda-12.2/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

source ~/.bashrc

systemctl status nvidia-persistenced

1.12 Check Versions:

nvidia-smi

nvcc  --version

If nvcc check fails then:

sudo apt install nvidia-cuda-toolkit -y

sudo reboot

INSTALL NVIDIA cuDNN v8.9.0:

1.0 Downloading Nvidia cuDNN requires registration and login.

1.1 Open a web browser and go to: https://developer.nvidia.com/cudnn

1.2 Click on: Download cuDNN Library

1.3 Complete the registration process (if not one before).

1.4 Log into the website.

1.5 Checkbox "I Agree To the Terms of the cuDNN Software License Agreement"

1.6 Click on the Archived cuDNN Releases link

1.7 Click on "Download cuDNN v8.9.0 (April 11th, 2023), for CUDA 12.x"

1.8 Click on "Local Installer for Ubuntu22.04 x86_64 (Deb)"

1.9 Once the download is complete, at the terminal, go to the Downloads folder, such as type,

cd Downloads

2.0 Then begin the installation of cuDNN and its dependencies:

sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.0.131_1.0-1_amd64.deb

sudo cp /var/cudnn-local-repo-*/cudnn-local-*-keyring.gpg /usr/share/keyrings/

sudo apt update -y

sudo apt install libcudnn8=8.9.0.131-1+cuda12.1

sudo apt install libcudnn8-samples=8.9.0.131-1+cuda12.1

sudo apt install make libfreeimage3 libfreeimage-dev

NOTE: The install lines having cuda12.1 are intentional as there is no support with cuda12.2. This is acceptable.

3.0 This is optional to compile the samples to perform a test. These 3.0 steps of can be disregarded, and just skip to Step 4.

cd /usr/src/cudnn_samples_v8/

cp -r /usr/src/cudnn_samples_v8/ $HOME

cd  $HOME/cudnn_samples_v8/mnistCUDNN

sudo make clean && sudo make

sudo reboot

3.1 After rebooting and logging back in, open a terminal.

3.2 Execute the compiled samples for the test:

cd  $HOME/cudnn_samples_v8/mnistCUDNN

./mnistCUDNN

3.3 The result should be - View Results: Test passed!

4.0 Set Paths for Cuda:

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc

echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

source ~/.bashrc

5.0 Verify Nvidia Settings

领英推荐

AI Hardware: CPU vs GPU vs NPU

Alex Wang 4 个月前

Large Language Models and Hardware: A Comparative…

Somya Rai 1 年前

GPU Clusters: Powering the Future of High-Performance…

Serverwala Cloud Data Centers Pvt. Ltd. 6 个月前

nvidia-smi

INSTALL BAZELISK:

TensorFlow requires the use of Bazel, which is a powerful build tool, like a super-smart recipe book for software, that helps organize and compile large code bases efficiently. Bazelisk is a helpful companion that automatically manages Bazel versions, ensuring you always use the right 'recipe' for projects like TensorFlow, without getting into technical hassles.

1.0.????? At the terminal, go to Downloads, such as:

cd Downloads

2.0.????? Download and copy Bazelisk:

wget https://github.com/bazelbuild/bazelisk/releases/download/v1.19.0/bazelisk-linux-amd64

chmod +x bazelisk-linux-amd64

sudo mv bazelisk-linux-amd64 /usr/local/bin/bazel

3.1.????? Set a Path for Bazelisk:

sudo nano ~/.bashrc

3.2.????? Add at the end of the file, similar as before:

export PATH=/usr/local/bin/bazel:$PATH

3.3.????? Save and exit (CTRL-O, CTRL-X)

4.0.????? To apply changes:

source ~/.bashrc

OPTIONAL: INSTALL CLANG v16

According to TensorFlow, CLang is required for use to compile from source. However, as of this time, CLang (versions v16 and v17) have not worked for TensorFlow version 2.15 and 2.16. I have only provided the instructions here as a reference. A side note, CLang will compile TensorFlow v2.15 and v2.16 but for CPU use only.

1.0.????? Download Clang Install Script:

wget https://apt.llvm.org/llvm.sh

2.0.????? Make the script executable:

chmod +x llvm.sh

3.0.????? Execute the script:

sudo ./llvm.sh 16

THE ENVIRONMENT IS SET TO:

Ubuntu v22.04.3 LTS (Minimal Installation)
- Python3 		v3.10.12
- Python3.11		v3.11.7
- Nvidia  		v535.129.03
- Cuda	  		v12.2
- GCC 			v11.4.0
- CLang			v16.06
- Bazel			Bazelisk
- cuDNN:		v8.9.0

PREPARE FOR TENSORFLOW:

1.0.????? Install the dependencies:

pip install -U --user pip numpy wheel packaging requests opt_einsum

pip install -U --user keras_preprocessing --no-deps

***** NOW FOR A TRICK *****

Even though those dependencies are to be installed, well, apparently they are not all of the dependencies required or may not be the right version. So, if you skip this step, the compile is likely to fail. The trick here is to install TensorFlow v2.15.0 update...

But we are building our own TensorFlow v2.16, so why this installation, with depending on your CPU may not work? The reason is to automatically have the environment to install the correct dependencies and then we will uninstall TensorFlow v2.15 later. The dependencies will remain.

Run the following command:

pip install tensorflow==2.15.0 --upgrade

DOWNLOAD TENSORFLOW REPOSITORY:

1.0.????? Now we will need to pull the TensorFlow Repository:

git clone https://github.com/tensorflow/tensorflow.git

cd tensorflow/

2.0.????? Normally the next step would be to “checkout” a version of TensorFlow, but we are using the current version (v2.16) as v2.15 cannot be compiled.

CHECK FOR CPU FLAGS:

1.0 This is the part to help optimize TensorFlow for your CPU, type:

grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }

The result should be something like this, which is what will be used:

-march=native -mssse3 -mcx16 -msse4.1 -msse4.2 -mpopcnt

Generally, the "-march=native" is sufficient to use, but why cut yourself short.

Or another method is to use the CPU Family, such as with this command:

cat /sys/devices/cpu/caps/pmu_name

Result would be something, such as Nehalem:

-march=Nehalem

CONFIGURE TENSORFLOW v2.16 for GPU:

1.0.????? Type the following command to configure for the build:

./configure

2.0.????? For the list of questions, use the following:

Python Location:		Default
Python Library:			Default
Tensorflow with ROCm: 		N
Tensorflow with CUDA: 		Y
Tensorflow with TensorRT:	N
CUDA Capabilities:		Default
CLang as Compiler: 		N
GCC Path: 			Default
Optimization Flags: 		-march=native -mssse3 -mcx16 -msse4.1 -msse4.2 -mpopcnt -Wno-gnu-offsetof-extensions
Android Builds: 		N

NOTES:

The flags of "-march=native -mssse3 -mcx16 -msse4.1 -msse4.2 -mpopcnt" where from the above command for finding the CPU Flags. Your result may be different, and therefore use, your results.
The "-Wno-gnu-offsetof-extensions" was added in the Optimization Flags as it seems to provide better use for compiling.

BUT WAIT, WE NEED TO FIX THE BAZEL CONFIGURATION FILE!

Unfortunately there is a bug with the Bazel configuration file in which there is a duplicate of "-Wno-gnu-offsetof-extensions", so this needs to be deleted:

sudo nano .tf_configure.bazelrc

Scroll down to the first line entry of "-Wno-gnu-offsetof-extensions" and delete it. In the picture below, it is the highlighted line. Once deleted, save and edit from the Nano editor.

COMPILE TENSORFLOW v2.16 for GPU:

1.0 Part 1 - Build the package-builder.

1.1 Set the Python environment to use, type:

export TF_PYTHON_VERSION=3.10

1.2 To help optimize for compiling with your system, you can determine the number of CPU Cores to use. You can use less, but recommended to use all as this will take several hours to complete. Run the following command:

nproc

1.3 The result is the number of available processors, in this case was 8 for the system I was using.

1.4 Run the following command to build the package builder. The number 8 for --jobs=8 is the number of processors to use. 8 is the example here. Your system may have more or less processors and therefore use the value that was displayed from running the "nproc" command above instead.

sudo bazel build --config=opt --jobs=8 //tensorflow/tools/pip_package:build_pip_package

After those many hours, the result should be as follows:

2.0 Part 2 - Build the package:

2.1 Fortunately this process does not take as long and can be performed by typing the following:

sudo ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

2.2 The result should be as follows:

INSTALL TENSORFLOW v2.16 for GPU:

1.0 Remove the existing Tensorflow v2.15 first. Be sure to change out of the current directory as an error may occur.

cd ..

pip uninstall tensorflow

1.1 Install Tensorflow v2.16, by using Python v3.11 and the -m parameter as shown below. I added --force-reinstall as a precautionary method:

python3.11 -m pip install /tmp/tensorflow_pkg/tensorflow*.whl --force-reinstall

1.2 You can copy the TensorFlow whl file as well for future use and to avoid another round of compiling.

TEST TENSORFLOW:

1.0 At the terminal, type:

python3.11

import tensorflow as tf

print("Number of GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

The results may be similar to below:

Now in this case, there appears to be errors of, "successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero." These are warnings and not errors. There is a means to correct this, by using the following script. The script searches for devices with Nvidia (the video cards) and sets a value to 0.

First, create the script, it will be named as "numa_node.start" for example. Can use Nano from the command line or Text Editor.

If to use Nano:

sudo nano numa_node.start

Add in the following lines:

#!/bin/bash

for pcidev in $(lspci -D|grep 'VGA compatible controller: NVIDIA'|sed -e 's/[[:space:]].*//'); do echo 0 > /sys/bus/pci/devices/${pcidev}/numa_node; done

Save and exit.

Secondly, make the script executable:

chmod +x numa_node.start

Third, test the script by running it:

sudo ./numa_node.start

Next, then run the test again, the results should no longer present the warnings.

Now to make the script work permanently, copy the script to /etc/local.d:

sudo cp numa_node.start /etc/local.d

Fourth, run the script as a Cron Job during bootup:

sudo crontab -e

Add at the end:

@reboot /etc/local.d/numa_node.start

Bonus: Now if interested in running another test, you can use this test python script.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import tensorflow as tf

# Check if TensorFlow was built with CUDA (GPU support)
print("Built with GPU support:", tf.test.is_built_with_cuda())

# List of available GPUs
gpus = tf.config.list_physical_devices('GPU')
print("GPUs available:", gpus)

# Additional test to check GPU utilization
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

with tf.device('/GPU:0'):  # Specifies that the operation runs on the first GPU.
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
print(c)

Conclusion

And there you have it, the procedures to build from source code for TensorFlow v2.16 with GPU support. This is a rather complex process, and unfortunately can lead to many hours of frustration due to the official documentation not being up-to-date and missing some details (such as, how to install Bazel and CLang properly).

Now compiling Pytorch is not easy either, but fortunately you can just install it for use with older CPUs.

And lastly, these sets of steps work currently. TensorFlow tends to be finicky and sensitive, especially with versions of its dependencies. If Version 2.16 does not compile, there is likely a typo, a missed dependency to install, or a step. But if errors kept occurring, or perhaps v2.16 is a bug that prevents your code from running properly, then the recommendation is to use the last known good version of v2.14, but you then have to use CUDA v11.8.

nima pourmoradi

???????? ??????, ??????? ????????, ???? ???????? ,2x ?????? ????? ??????

7 个月

After 13 day , finally i can use gpu with your help , thanks alot ??????????????????

DataInsta

10 个月

This guide is a game changer for anyone dealing with TensorFlow compatibility issues. ????

1 次回应

Data & Analytics

10 个月

Great job on covering a complex topic! Looking forward to reading your guide. ????

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Building TensorFlow v2.16 with GPUs from Source.

Patrick Hamilton

CTO Internet 2.0 | Director & Boardmember (US) | Cybersecurity & Technology Expert | Machine Learning & Neural Network Specialist | Financial Institutions & Critical Infrastructure | Solution Architect | CISSP ?

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Unleashing the Power of 1-Bit LLMs with bitnet.cpp: Accelerating Inference and Efficiency

Running ML inference with AMD GPU and ROCm (Part II)

Choosing the Right Server for Your Computer Vision Project: Key Criteria to Consider

NVIDIA CUDA and Ollama for AI Model Deployment

NVIDIA HPC container available in NVIDIA GPU Cloud?

The Power Trio of Modern Computing: Understanding GPUs, CPUs, and NPUs

Why GPU Can Process Image Much Faster than CPU?

Crafting an Alternative Edge Computing Solution to NVIDIA CUDA

领英推荐

Old Machines, New Tricks: Building TensorFlow v2.16 from Scratch

2024年1月8日

Top Scanned Ports: Week 2

2023年1月16日

Top Scanned Ports: Week 1

2023年1月9日

Bitcoin: its infrastructure against an Adaptive Botnet

2020年11月19日

Stalkerware: Retina-X (MobileSpy, PhoneSheriff and TeenShield)

2019年10月24日

Cyber Awareness Article - Sextortion

2019年9月17日

Email: I know H4kT3hM@k is your passphrase

2018年9月12日

Passwords can Now be Stolen by Hackers Reading your Brainwaves

2017年7月7日

10 Ways You’re Failing at IT Audits

2017年7月6日

How Artificial Intelligence Became the Darling of an Industry

2017年7月5日

社区洞察

其他会员也浏览了

Unleashing the Power of 1-Bit LLMs with bitnet.cpp: Accelerating Inference and Efficiency

Running ML inference with AMD GPU and ROCm (Part II)

Choosing the Right Server for Your Computer Vision Project: Key Criteria to Consider

NVIDIA CUDA and Ollama for AI Model Deployment

NVIDIA HPC container available in NVIDIA GPU Cloud?

The Power Trio of Modern Computing: Understanding GPUs, CPUs, and NPUs

Why GPU Can Process Image Much Faster than CPU?

Crafting an Alternative Edge Computing Solution to NVIDIA CUDA