TensorFlow-GPU + Ubuntu + WSL

TensorFlow-GPU + Ubuntu + WSL

This article walks you through the steps I discovered recently for setting up a working environment to create TensorFlow-GPU neural networks on Ubuntu Linux hosted on Windows Subsystem for Linux.

There are several tutorials, blog posts, and installation instructions out there for doing this and I tried many / most / all of them in dozens of attempts over a week of time. Each failed entirely or seemed to work at first, but then failed later e.g., when setting up a TensorFlow Convolutional Neural Network (cuDNN library was not setup correctly).

These steps have been followed from scratch multiple times to prove they work given the described versions of software and order of execution.

Dual booting into Linux to do some work when most of my tools, notes, browser favorites, etc. are in Windows got to be such a pain that I recently decided to spend whatever time and effort was necessary to get this system setup. I had read that WSL (Windows Subsystem for Linux) now had pretty good support for GPU access from Linux (via tools like TensorFlow and DirectML) so I had hoped that it would be worth the effort. It was not a straightforward process, however in the end it has been worth the effort to figure it all out. Along the way I got all the necessary steps written down in my notes, which I am sharing here.

The starting point is a Windows 11 PC with an Nvidia RTX 2070 GPU. We will first make sure WSL is setup. This part is quick and easy; however, it gets a bit more involved because we will want to make frequent backups of the Ubuntu image. Restoring these images results in the loss of the default configs Windows uses when setting up the distribution. This includes losing the Windows shortcut for Ubuntu. Instead of using this method for launching the image we will be using Windows Terminal. As it turns out, this is a better method.




Windows Subsystem for Linux

Open PowerShell and use the following command, "wsl -l -v". If WSL is already enabled (perhaps because you are using Docker) then this will list what distributions are currently installed. If it is apparent WSL is already enabled, then go to the Windows store and install Ubuntu to add it to the WSL distribution list. If PowerShell does not recognize the command then run the following command, "wsl -install". This should ensure WSL is enabled and install Ubuntu. Run "wsl -l -v" again. If Ubuntu is not listed, then go to the Windows store and install Ubuntu. Do not run Ubuntu yet.

Now, let's make sure the latest WSL Kernel is installed. Go to Windows Update > Advanced Options. Turn this feature on. Then run Windows update to get the latest version and then reboot.

No alt text provided for this image

Let's assume you have added a new 500 GB D: drive to your PC to hold your WSL distributions and all their backups, machine learning data, etc. At this point Ubuntu will be installed on C: drive. Go to Apps > Installed apps and use the options to move the installation to the D: drive before running Ubuntu for the first time.

No alt text provided for this image

Now run Ubuntu for the first time. Linux will ask you to setup your user account. This is the account that will be active each time you start a shell into Ubuntu. Once this is complete, run this command in the Linux shell "exit" and the distribution will be shutdown. If you run the command in PowerShell "wsl -l -v" again you will see the Ubuntu distribution. Before going any further let's get the first backup of this image done and setup Windows Terminal.

No alt text provided for this image

Create a folder to hold the WSL images which we will be using instead of the location Windows used for the initial install. Also create a folder to hold the backups. e.g., D:\wsl\ and D:\backups\. Now make the first backup with the command "wsl --export Ubuntu-22.04 d:\backups\ubuntu2204_initial.tar".

Now uninstall the distribution Windows added (right click on the Ubuntu icon in the Start menu or use Add / Remove apps). Then add Ubuntu back into WSL using the backup. This command will import the tar file from the d:\backups path. "wsl --import Ubuntu d:\wsl\ubuntu d:\backups\ubuntu2204_initial.tar". Right now the size of the backup tar files are small, however soon they will be several gigabytes in size. This is a good reason to make sure you have plenty of disk space to hold multiple backups.

When it comes time to restore from backup e.g., because something has gone wrong during an install of some Python package, then first remove the distribution from WSL with this command "wsl --unregister Ubuntu". Then you can use the wsl --import command to restore from a backup file. If you want to run a distribution directly from the PowerShell command line, use the following command "wsl -d Ubuntu -u ggibson --cd ~" being sure to use your own username.

Now we will setup a custom profile in Windows Terminal for opening a shell into Ubuntu.

Open Windows Terminal and go to the Settings.

No alt text provided for this image

Setup a new profile for the Ubuntu distribution you created above. Look around at the various options to customize your Linux shell experience. Once setup you can open multiple tabs in the Windows Terminal to the same instance of Ubuntu e.g., one that is running Jupyter Lab and one for running Linux commands or another server type.

At one point I was creating a new profile while a backup was occurring and Windows Terminal corrupted the profile. I was able to fix this by using the Json Settings File button you can see in the Profiles screen.
No alt text provided for this image




Anaconda

Startup Ubuntu and in the shell download either the full Anaconda or Mini-conda. The following two URLs list different distributions of Anaconda. These URLs point to .sh files. You just copy the .sh file URL from one of these pages and then use the Linux command wget to download the .sh file for the desired Anaconda. Then run the .sh file to complete the installation.

https://www.anaconda.com/products/distribution

https://docs.conda.io/en/latest/miniconda.html

No alt text provided for this image

wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh

bash Anaconda3-2022.10-Linux-x86_64.sh

When the installer asks if you would like to initialize Anaconda3, say yes.

Exit and restart Ubuntu.

Before installing any Python libraries or starting any new projects first create a virtual environment to contain all libraries and project files. This command will create a virtual environment. Give the environment a name where <NAME> appears in the command. A more specific version of this for a full TensorFlow and CUDA setup is given later in this article.

conda create -n <NAME> -y

Activating a virtual environment makes it so that most changes you make to your system e.g. installing a Python library only applies to this virtual environment rather than to the core Linux environment.

Activate / Deactivate virtual environment?

conda activate <NAME>

conda deactivate

The size of the Ubuntu distribution backups can be decreased by telling conda to get rid of unneeded files.

Get size of Anaconda directory?

sudo du -sh anaconda3?

Cleanup / Shrink Anaconda directory (Remove unused packages and caches.)?

conda clean --all




Setup Nvidia CUDA and cuDNN libraries for GPU support

One of the most important points made in all the documentation from Nvidia, Ubuntu, and Microsoft about the GPU support in WSL is to ONLY install the Nvidia video card drivers into Windows itself and not into the Ubuntu OS. Just making sure you have the latest drivers for Windows installed e.g., using the "GeForce Experience" installer is all you need for this step.

If you are satisfied with TensorFlow 2.4.1 then the install can be done by a call to conda. The following command will not only install TensorFlow 2.4.1, but also the Nvidia components. First activate the virtual environment and then run the following command.

conda install tensorflow-gpu


If you would like to install a more recent version of CUDA and TensorFlow e.g. 2.10, CUDA 11.2, and cuDNN 8 then the following instructions should be used.

* Note: Before going down this route I should mention that the Keras Tuner (hyperparameter tuner) library destroyed the TensorFlow version 2.10 setup when it was installed (luckily, I was keeping backups of the Ubuntu image before making changes). So not only are there versioning issues between TensorFlow and CUDA, but also potentially with various other libraries.


Documentation from Nvidia

https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl

Documentation from Ubuntu


These are the commands to run in the Linux shell:

sudo apt-key del 7fa2af80

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin

sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda-repo-wsl-ubuntu-12-0-local_12.0.0-1_amd64.deb

sudo dpkg -i cuda-repo-wsl-ubuntu-12-0-local_12.0.0-1_amd64.deb

sudo cp /var/cuda-repo-wsl-ubuntu-12-0-local/cuda-*-keyring.gpg /usr/share/keyrings/

sudo apt-get update

sudo apt-get -y install cuda


Then create a virtual environment configured for this setup (run as a single command):

CONDA_CUDA_OVERRIDE="11.2" conda create --name=<NAME> python=3.9 tensorflow-gpu=2.10.0 --channel=conda-forge --no-default-packages


Then setup this environment variable:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH


Finally, to test that the GPU is detected:

python3 -c "import tensorflow as tf;from tensorflow.python.client import device_lib;print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')));device_lib.list_local_devices()"




JupyterLab

Activate the virtual environment you are using for your TensorFlow project and then run the following command to install JupyterLab:

conda install -c conda-forge jupyterlab

Installing the following extensions will enhance the experience and add features such as interactive code debugging and more advanced intellisense.




(NodeJS)

conda install -c conda-forge/label/cf202003 nodejs

URL: https://nodejs.org/en/about/

Description:

Used by JupyterLab for its management of Extensions.




(nb_conda_kernels)

conda install -c conda-forge nb_conda_kernels

URL: https://github.com/Anaconda-Platform/nb_conda_kernels

Description:

This extension enables JupyterLab?in one?conda?environment to access kernels for Python, R, and other languages found in other environments. (e.g. Kernel switching to load a Python interactive debugging kernel)




(ipykernel)

conda install ipykernel

Description:

Default python kernel




(xeus-python)

conda install -c conda-forge xeus-python

URL: https://github.com/jupyter-xeus/xeus-python

Description:

A python kernel that supports the Jupyter Lab debugger. Switch kernels to "XPython" in JupyterLab to enable the debugging button.




(jupyterlab-spellchecker)

pip install jupyterlab-spellchecker

URL: https://github.com/jupyterlab-contrib/spellchecker

Description:

A JupyterLab extension highlighting misspelled words in markdown cells within notebooks and in the text files.




(jupyterlab-code-formatter)

pip install jupyterlab-code-formatter

URL: https://github.com/ryantam626/jupyterlab_code_formatter

Description:

A JupyterLab plugin to facilitate invocation of code formatters.




(black)

pip install black

URL: https://black.readthedocs.io/en/stable/index.html

Description:

The uncompromising code formatter.




(isort)

pip install isort

URL: https://pycqa.github.io/isort/index.html

Description:

isort your imports, so you don't have to.




(jupyterlab-lsp)

conda install -c conda-forge 'jupyterlab>=3.0.0,<4.0.0a0' jupyterlab-lsp

conda install -c conda-forge python-lsp-server r-languageserver

URL: https://github.com/jupyter-lsp/jupyterlab-lsp

Description:

Better code completion / intellisense




Some additional data science libraries

Matplotlib Pandas Numpy Scikit-Learn

conda install -c conda-forge matplotlib numpy pandas scikit-learn


Run JupyterLab with this command using a port of your choice:

jupyter lab --no-browser --port <PORT>

JupyterLab will print out URLs you can browse to (from Windows) as it starts up.


Update:

To test TensorFlow CNN (using cuDNN) is working you can activate your virtual environment (e.g. conda activate my_env), use VIM to paste in the code below into a Python file...

At the command line in the virtual environment:

vim test_CNN_tensorflow.py

Then inside VIM press <I> for insert mode, paste in the code below, press <ESC> to exit insert mode, and then type :wq <ENTER> to save and close VIM.

Then back at the command line run the python file:

python3 test_CNN_tensorflow.py


CNN TensorFlow test code:



from functools import partial

import tensorflow as tf

from tensorflow import keras

fashion_mnist = keras.datasets.fashion_mnist

(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

print("X_train_full.shape", X_train_full.shape)

print("X_train_full.dtype", X_train_full.dtype)

X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0

X_test = X_test / 255.0

y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

class_names = [

"T-shirt/top",

"Trouser",

"Pullover",

"Dress",

"Coat",

"Sandal",

"Shirt",

"Sneaker",

"Bag",

"Ankle boot",

]

DefaultConv2D = partial(

tf.keras.layers.Conv2D,

kernel_size=3,

padding="same",

activation="relu",

kernel_initializer="he_normal"

)

model = tf.keras.Sequential(

[

DefaultConv2D(filters=64, kernel_size=7, input_shape=[28, 28, 1]),

DefaultConv2D(filters=64),

DefaultConv2D(filters=64),

DefaultConv2D(filters=64),

DefaultConv2D(filters=48),

DefaultConv2D(filters=48),

DefaultConv2D(filters=48),

DefaultConv2D(filters=32),

DefaultConv2D(filters=32),

DefaultConv2D(filters=32),

DefaultConv2D(filters=24),

DefaultConv2D(filters=24),

DefaultConv2D(filters=24),

tf.keras.layers.Flatten(),

tf.keras.layers.Dense(units=128, kernel_initializer="he_normal"),

tf.keras.layers.BatchNormalization(),

tf.keras.layers.Activation("swish"),

tf.keras.layers.Dropout(0.5),

tf.keras.layers.Dense(units=64, kernel_initializer="he_normal"),

tf.keras.layers.BatchNormalization(),

tf.keras.layers.Activation("swish"),

tf.keras.layers.Dropout(0.5),

tf.keras.layers.Dense(units=10, activation="softmax"),

])


import datetime

print("Begin", datetime.datetime.now())

model.compile(

??loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"]

)

history = model.fit(

??X_train, y_train, epochs=15, validation_data=(X_valid, y_valid), batch_size=20

)

print("End", datetime.datetime.now())

Bob Miles ??

Founder & CEO | Salad Technologies

2 年

Gerald Gibson this is awesome to see, thanks for sharing! The engineers at Salad have been doing similar work to unlock the potential within the tens of thousands of GPUs on our network - very excited to see this stack come together, it'll bring a huge unlock of resources ?? ??

要查看或添加评论,请登录

Gerald Gibson的更多文章

  • Chat That App Intro

    Chat That App Intro

    Chat That App is a Python desktop app I created by using ChatGPT from OpenAI to generate classes, functions, etc. that…

  • ChatGPT + Timeseries Anomalies

    ChatGPT + Timeseries Anomalies

    Over the past five years, I have been transforming my career from software engineering to machine learning engineering.…

    2 条评论
  • Airflow + PostgreSQL + WSL

    Airflow + PostgreSQL + WSL

    Airflow is a software service that provides asynchronous and distributed execution of workflows. There are several…

    3 条评论
  • Probabilistic Data Separation

    Probabilistic Data Separation

    Clusters, modes, distributions, categories, sub-populations, sub-signals, mixtures, proportions, ratios, density curve.…

  • Regional and Online Learnable Fields

    Regional and Online Learnable Fields

    Regional and Online Learnable Fields is a type of data clustering algorithm invented in the early 2000's. It was…

    1 条评论
  • Designing an architecture for MLOps

    Designing an architecture for MLOps

    A large part of architecting anything complex (think software, large buildings, aircraft, etc.) is the skill of mental…

  • Splunk & Datacamp Training

    Splunk & Datacamp Training

    Not a real article. Just a place to host these since the one drive sharing option is not working.

  • Random, Stochastic, Probabilistic

    Random, Stochastic, Probabilistic

    At the end of the previous article it was mentioned that we would show how, from a computer programming perspective…

  • Bayesian probabilities visualized 2

    Bayesian probabilities visualized 2

    In the previous article we covered the basics about what some of these words / phrases used in the Bayesian world…

  • Bayesian probabilities visualized

    Bayesian probabilities visualized

    I once saw an interview of Benoit Mandelbrot in which he described as a child in his math studies he saw shapes in his…

社区洞察

其他会员也浏览了