TensorFlow-GPU + Ubuntu + WSL
Gerald Gibson
Principal Engineer @ Salesforce | Hyperscale, Machine Learning, Patented Inventor
This article walks you through the steps I discovered recently for setting up a working environment to create TensorFlow-GPU neural networks on Ubuntu Linux hosted on Windows Subsystem for Linux.
There are several tutorials, blog posts, and installation instructions out there for doing this and I tried many / most / all of them in dozens of attempts over a week of time. Each failed entirely or seemed to work at first, but then failed later e.g., when setting up a TensorFlow Convolutional Neural Network (cuDNN library was not setup correctly).
These steps have been followed from scratch multiple times to prove they work given the described versions of software and order of execution.
Dual booting into Linux to do some work when most of my tools, notes, browser favorites, etc. are in Windows got to be such a pain that I recently decided to spend whatever time and effort was necessary to get this system setup. I had read that WSL (Windows Subsystem for Linux) now had pretty good support for GPU access from Linux (via tools like TensorFlow and DirectML) so I had hoped that it would be worth the effort. It was not a straightforward process, however in the end it has been worth the effort to figure it all out. Along the way I got all the necessary steps written down in my notes, which I am sharing here.
The starting point is a Windows 11 PC with an Nvidia RTX 2070 GPU. We will first make sure WSL is setup. This part is quick and easy; however, it gets a bit more involved because we will want to make frequent backups of the Ubuntu image. Restoring these images results in the loss of the default configs Windows uses when setting up the distribution. This includes losing the Windows shortcut for Ubuntu. Instead of using this method for launching the image we will be using Windows Terminal. As it turns out, this is a better method.
Windows Subsystem for Linux
Open PowerShell and use the following command, "wsl -l -v". If WSL is already enabled (perhaps because you are using Docker) then this will list what distributions are currently installed. If it is apparent WSL is already enabled, then go to the Windows store and install Ubuntu to add it to the WSL distribution list. If PowerShell does not recognize the command then run the following command, "wsl -install". This should ensure WSL is enabled and install Ubuntu. Run "wsl -l -v" again. If Ubuntu is not listed, then go to the Windows store and install Ubuntu. Do not run Ubuntu yet.
Now, let's make sure the latest WSL Kernel is installed. Go to Windows Update > Advanced Options. Turn this feature on. Then run Windows update to get the latest version and then reboot.
Let's assume you have added a new 500 GB D: drive to your PC to hold your WSL distributions and all their backups, machine learning data, etc. At this point Ubuntu will be installed on C: drive. Go to Apps > Installed apps and use the options to move the installation to the D: drive before running Ubuntu for the first time.
Now run Ubuntu for the first time. Linux will ask you to setup your user account. This is the account that will be active each time you start a shell into Ubuntu. Once this is complete, run this command in the Linux shell "exit" and the distribution will be shutdown. If you run the command in PowerShell "wsl -l -v" again you will see the Ubuntu distribution. Before going any further let's get the first backup of this image done and setup Windows Terminal.
Create a folder to hold the WSL images which we will be using instead of the location Windows used for the initial install. Also create a folder to hold the backups. e.g., D:\wsl\ and D:\backups\. Now make the first backup with the command "wsl --export Ubuntu-22.04 d:\backups\ubuntu2204_initial.tar".
Now uninstall the distribution Windows added (right click on the Ubuntu icon in the Start menu or use Add / Remove apps). Then add Ubuntu back into WSL using the backup. This command will import the tar file from the d:\backups path. "wsl --import Ubuntu d:\wsl\ubuntu d:\backups\ubuntu2204_initial.tar". Right now the size of the backup tar files are small, however soon they will be several gigabytes in size. This is a good reason to make sure you have plenty of disk space to hold multiple backups.
When it comes time to restore from backup e.g., because something has gone wrong during an install of some Python package, then first remove the distribution from WSL with this command "wsl --unregister Ubuntu". Then you can use the wsl --import command to restore from a backup file. If you want to run a distribution directly from the PowerShell command line, use the following command "wsl -d Ubuntu -u ggibson --cd ~" being sure to use your own username.
Now we will setup a custom profile in Windows Terminal for opening a shell into Ubuntu.
Open Windows Terminal and go to the Settings.
Setup a new profile for the Ubuntu distribution you created above. Look around at the various options to customize your Linux shell experience. Once setup you can open multiple tabs in the Windows Terminal to the same instance of Ubuntu e.g., one that is running Jupyter Lab and one for running Linux commands or another server type.
At one point I was creating a new profile while a backup was occurring and Windows Terminal corrupted the profile. I was able to fix this by using the Json Settings File button you can see in the Profiles screen.
Anaconda
Startup Ubuntu and in the shell download either the full Anaconda or Mini-conda. The following two URLs list different distributions of Anaconda. These URLs point to .sh files. You just copy the .sh file URL from one of these pages and then use the Linux command wget to download the .sh file for the desired Anaconda. Then run the .sh file to complete the installation.
wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
bash Anaconda3-2022.10-Linux-x86_64.sh
When the installer asks if you would like to initialize Anaconda3, say yes.
Exit and restart Ubuntu.
Before installing any Python libraries or starting any new projects first create a virtual environment to contain all libraries and project files. This command will create a virtual environment. Give the environment a name where <NAME> appears in the command. A more specific version of this for a full TensorFlow and CUDA setup is given later in this article.
conda create -n <NAME> -y
Activating a virtual environment makes it so that most changes you make to your system e.g. installing a Python library only applies to this virtual environment rather than to the core Linux environment.
Activate / Deactivate virtual environment?
conda activate <NAME>
conda deactivate
The size of the Ubuntu distribution backups can be decreased by telling conda to get rid of unneeded files.
Get size of Anaconda directory?
sudo du -sh anaconda3?
Cleanup / Shrink Anaconda directory (Remove unused packages and caches.)?
conda clean --all
Setup Nvidia CUDA and cuDNN libraries for GPU support
One of the most important points made in all the documentation from Nvidia, Ubuntu, and Microsoft about the GPU support in WSL is to ONLY install the Nvidia video card drivers into Windows itself and not into the Ubuntu OS. Just making sure you have the latest drivers for Windows installed e.g., using the "GeForce Experience" installer is all you need for this step.
If you are satisfied with TensorFlow 2.4.1 then the install can be done by a call to conda. The following command will not only install TensorFlow 2.4.1, but also the Nvidia components. First activate the virtual environment and then run the following command.
conda install tensorflow-gpu
If you would like to install a more recent version of CUDA and TensorFlow e.g. 2.10, CUDA 11.2, and cuDNN 8 then the following instructions should be used.
* Note: Before going down this route I should mention that the Keras Tuner (hyperparameter tuner) library destroyed the TensorFlow version 2.10 setup when it was installed (luckily, I was keeping backups of the Ubuntu image before making changes). So not only are there versioning issues between TensorFlow and CUDA, but also potentially with various other libraries.
Documentation from Nvidia
Documentation from Ubuntu
These are the commands to run in the Linux shell:
sudo apt-key del 7fa2af80
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo dpkg -i cuda-repo-wsl-ubuntu-12-0-local_12.0.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-0-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
Then create a virtual environment configured for this setup (run as a single command):
CONDA_CUDA_OVERRIDE="11.2" conda create --name=<NAME> python=3.9 tensorflow-gpu=2.10.0 --channel=conda-forge --no-default-packages
Then setup this environment variable:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Finally, to test that the GPU is detected:
python3 -c "import tensorflow as tf;from tensorflow.python.client import device_lib;print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')));device_lib.list_local_devices()"
JupyterLab
Activate the virtual environment you are using for your TensorFlow project and then run the following command to install JupyterLab:
conda install -c conda-forge jupyterlab
Installing the following extensions will enhance the experience and add features such as interactive code debugging and more advanced intellisense.
(NodeJS)
conda install -c conda-forge/label/cf202003 nodejs
URL: https://nodejs.org/en/about/
Description:
Used by JupyterLab for its management of Extensions.
(nb_conda_kernels)
conda install -c conda-forge nb_conda_kernels
Description:
This extension enables JupyterLab?in one?conda?environment to access kernels for Python, R, and other languages found in other environments. (e.g. Kernel switching to load a Python interactive debugging kernel)
(ipykernel)
conda install ipykernel
Description:
Default python kernel
(xeus-python)
conda install -c conda-forge xeus-python
URL: https://github.com/jupyter-xeus/xeus-python
Description:
A python kernel that supports the Jupyter Lab debugger. Switch kernels to "XPython" in JupyterLab to enable the debugging button.
(jupyterlab-spellchecker)
领英推荐
pip install jupyterlab-spellchecker
Description:
A JupyterLab extension highlighting misspelled words in markdown cells within notebooks and in the text files.
(jupyterlab-code-formatter)
pip install jupyterlab-code-formatter
Description:
A JupyterLab plugin to facilitate invocation of code formatters.
(black)
pip install black
Description:
The uncompromising code formatter.
(isort)
pip install isort
Description:
isort your imports, so you don't have to.
(jupyterlab-lsp)
conda install -c conda-forge 'jupyterlab>=3.0.0,<4.0.0a0' jupyterlab-lsp
conda install -c conda-forge python-lsp-server r-languageserver
Description:
Better code completion / intellisense
Some additional data science libraries
Matplotlib Pandas Numpy Scikit-Learn
conda install -c conda-forge matplotlib numpy pandas scikit-learn
Run JupyterLab with this command using a port of your choice:
jupyter lab --no-browser --port <PORT>
JupyterLab will print out URLs you can browse to (from Windows) as it starts up.
Update:
To test TensorFlow CNN (using cuDNN) is working you can activate your virtual environment (e.g. conda activate my_env), use VIM to paste in the code below into a Python file...
At the command line in the virtual environment:
vim test_CNN_tensorflow.py
Then inside VIM press <I> for insert mode, paste in the code below, press <ESC> to exit insert mode, and then type :wq <ENTER> to save and close VIM.
Then back at the command line run the python file:
python3 test_CNN_tensorflow.py
CNN TensorFlow test code:
from functools import partial
import tensorflow as tf
from tensorflow import keras
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
print("X_train_full.shape", X_train_full.shape)
print("X_train_full.dtype", X_train_full.dtype)
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
X_test = X_test / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
class_names = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
DefaultConv2D = partial(
tf.keras.layers.Conv2D,
kernel_size=3,
padding="same",
activation="relu",
kernel_initializer="he_normal"
)
model = tf.keras.Sequential(
[
DefaultConv2D(filters=64, kernel_size=7, input_shape=[28, 28, 1]),
DefaultConv2D(filters=64),
DefaultConv2D(filters=64),
DefaultConv2D(filters=64),
DefaultConv2D(filters=48),
DefaultConv2D(filters=48),
DefaultConv2D(filters=48),
DefaultConv2D(filters=32),
DefaultConv2D(filters=32),
DefaultConv2D(filters=32),
DefaultConv2D(filters=24),
DefaultConv2D(filters=24),
DefaultConv2D(filters=24),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=128, kernel_initializer="he_normal"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Activation("swish"),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(units=64, kernel_initializer="he_normal"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Activation("swish"),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(units=10, activation="softmax"),
])
import datetime
print("Begin", datetime.datetime.now())
model.compile(
??loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
)
history = model.fit(
??X_train, y_train, epochs=15, validation_data=(X_valid, y_valid), batch_size=20
)
print("End", datetime.datetime.now())
Founder & CEO | Salad Technologies
2 年Gerald Gibson this is awesome to see, thanks for sharing! The engineers at Salad have been doing similar work to unlock the potential within the tens of thousands of GPUs on our network - very excited to see this stack come together, it'll bring a huge unlock of resources ?? ??