What the GRUB ! ??
Preamble ??
Recently I ran into a flurry of issue while setting up my gaming + development rig. The idea was to connect multiple GPUs for deep learning work loads, setup my favourite linux development environement, grab a beer and get going.
But... You know, the gods of technology love to have their way with you. My patience and resolve was tested and tried like biblical character Job. ??
Unfortunately, I couldn't get rid of Windows altogether as some of the games I play do not work in Steam linux environment. I learnt that GPU pass through is next to impossible in a virtualized environment. I tried VMware and VirtualBox and learned that the only way to achieve a pass through with your sanity intact is via commercial tools like VMware vSphere.
The setup ??
So, I decided to setup a dual boot system, having Windows 11 on one SSD and Ubuntu 20.04 on another SSD. Everything went smoothly, just as expected. But tensorflow wouldn't pickup the GPU. It turned out that Ubuntu recommended nouveau drivers do not work. You need to install original nvidia drivers, CUDA libraries and nvidia machine learning library. If you are interested in concrete steps, here they are:
$ sudo apt-search nvidia-driver
2. Get the CUDA pin for Ubuntu 20.04.
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
3. Move the pin to apt preferences location.
$ sudo?mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
4. Get authentication key for the CUDA apt package.
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
5. Add the apt repository & update.
$?sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt update
6. Get the nvidia machine learning library and install it.
$ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb
$ sudo apt install ./nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb
7. Install CUDA.
$ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/libnccl2_2.8.3-1+cuda11.2_amd64.deb
$ sudo apt install ./libnccl2_2.8.3-1+cuda11.2_amd64.deb
$ sudo apt update?
$?sudo apt install cuda-11-2 libcudnn8=8.1.1.33-1+cuda11.2 libcudnn8-dev=8.1.1.33-1+cuda11.2?
$?sudo apt install libnvinfer8=8.0.0-1+cuda11.0 libnvinfer-dev=8.0.0-1+cuda11.0 libnvinfer-plugin8=8.0.0? -1+cuda11.0??
8. Setup path variables for CUDA.
$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
$ export CUDA_HOME=/usr/local/cuda??
9. In case you are having issues with secure boot system and nvidia driver is not loading up, you would need to remove the dkms binding for the?driver and reinstall it.
$ sudo dkms remove nvidia/495.44 --all
$ sudo dkms install --force nvidia/495.44 -k $(uname -r)?
$ sudo update-initramfs -u
?
$ sync
$ reboot
? ? ? ?
# After reboot, enroll the MOK
The pain ??
For relatively newer systems that come loaded with TPM 2.0, your newly installed nvidia drivers wouldn't automagically load. Because nvidia kernel modules has a kernel interface layer that must be compiled specifically for a certain kernel. I learnt that this is a typical behaviour with secure boot system backed by TPM. You would need to explicitely remove dynamic kernel module support (DKMS) binding for your nvidia driver and rebind it. After a reboot, you will be asked to enroll a machine owner key (MOK) that will be generated in the TPM. MOKs enable signing and verification of 3rd part deployed modules and custom built kernels. These modules are then securely loaded by a shim layer between UEFI Secure Boot and GRUB or the kernel.
Yeah yeah, I know. Lot of big words in the previous paragraph. All you need to take away, is that the sh** is getting real mate. :)
Once that issue was out of the way, I could eventually start hacking. Played around for a while and then decided to check something up on Windows side. Lo and behold, right after GRUB screen where I chose the Windows boot manager, an ominous blue screen appeared. Actually, the first time it didn't look ominous. Hey, it wasn't the ususal blue screen of death that Windows is famous for. I kind of expected that. After enrolling a new MOK for the graphic card driver, I would expect the Windows to ask for the bitlocker recovery key. I diligently or rather slavishly entered the key and got logged in. I thought that was the end of it. But no....no....no. That was the beginning of a vicious cycle of me scolding Microsoft and trying various work arounds and then going back to scolding. Every single time I would turn on my system and if I would dare try logging into Windows, this flippin' screen would come up, asking for bitlocker recovery key. It was annoying as hell !
In Microsoft's defence though, once I realized what was the actual cause, I felt sorry for the poor folks at Microsoft. I decided to write this article up so that folks at Microsoft do not receive the abuse in-absentia that they do not deserve.
Okay, enough of ranting and resentment. Let's pin down what was going on and how I approached the problem to find a solution.
Windows boot manager do not trust GRUB. There it is, I have said it. Fair enough or is it ? For the uninitiated, GRUB stands for Grand Unified Bootloader. A bootloader is the first software that runs when you turn on your computer. It loads your OS and then OS takes over. There are many bootloaders available for various linux distros and GRUB is the most popular. In EFI capable systems, the firmware reads the EFI system partition (ESP) to look for boot information.
Whenever the UEFI secure boot system loaded GRUB and I would choose Windows boot manager from there, it would righfully complain and ask for bitlocker recovery key as it would suspect a malicious break in.
Simplest solution would have been to disable encryption of the drives and then disable bitlocker. But I didn't want to do it. To be brutally honest, I actually tried to do that a few times just for the sake of the experiment, but it is Windows, remember ? I tried GUI and command prompt both, but every single time the OS would hang. I heard some reddit chatter that would advise to forget about manage-bde commands (commands to control bitlocker) working on Windows 11 Home. I gave up after a few attempts, anyway I didn't want to do that in the first place. Or, grapes are sour.
Another (silly) option would have been to turn off secure boot. STOP ! don't do that. Windows 11 doesn't work without TPM2.0 and secure boot support. You would permanently screw up your Windows installation, if you tried to do that. Windwos 10 users could perhaps still do that.
领英推荐
After fiddling with EFI system partition, I realized that the GRUB was installed on my other SSD upon which I had installed the linux OS. I thought if I could install GRUB on the SSD where Windows boot manager and Windows OS existed, I could persuade Windows boot manager to trust GRUB. So I did exactly that. Again, "Do not try this at home". This is adult stuff and you may require adult superviosion ;). Nah, just kidding. Go ahead and play around GRUB, it's good for health.
Disclaimer: It didn't resolve the issue. However, it did make sense originally. Loading GRUB from a trusted location should have made my life easy, but it didn't.
Anyway, I will shoot in the lines below what I did. In my case, both drives had ESP partitions. So I didn't need to make one. But if you don't have it, you can make one. Here is a step-by-step guide:
$ lsblk -f
$ gdisk /dev/sda
$ mkfs.vfat /dev/sda1
$ mkdir /boot/efi
$ mount /dev/sda1 /boot/efi
# install missing dependency
$ sudo apt install shim
# create GRUB configuration file. GRUB will look for this config file.
$ sudo grub-mkconfig -o /boot/efi/EFI/ubuntu/grub.cfg
# install GRUB
$ grub-install /dev/sda
The solace ??
Once I miserably failed to persuade Windows boot manager to trust GRUB. I started to observe what TPM 2.0 was doing in case when I would boot directly from Windows boot manager or through GRUB.
I noticed a pattern. The platform validation profile kept on changing. This profile consists of a set of indices ranging from 0-23. This profile is being read from Platform Configuration Register (PCR) of the TPM 2.0. Each PCR index correspond to a service that runs when OS starts up. Each time the computer starts, the TPM will check that the specified services in the platform validation profile have not changed. If any of these services change while bitlocker drive encryption (BDE) protection is on, the TPM will not release the encryption key to unlock the disk volume and the computer will enter into recovery mode. For further reading on the topic, consult Microsft documentation regarding PCR profiles.
Even when the GRUB is being executed from tusted location, it still ends up changing the PCR profile because the expectation is that it is going to be Windows boot manager instead.
I had finally understood the real cause behind the issue. PCR7 is the profile contains the signature of boot sequence entry. I could see that if somehow I can change the value stored at PCR7, I should be able to theoratically coerce the system to trust GRUB. However, here I hit a brick wall. 'manage-bde' command could only provide me with profile information on Windows 11 Home. It wouldn't allow me to modify the protectors. This method to disable prtoectors and reenable them with the boot setup required would supposedly work on Windows Professional. But I haven't tried it. Here are the steps (only step 1 works for me):
$ manage-bde -protectors -get C:
2. Disable TPM protectors.
$ manage-bde -protectors -delete C: -type TPM
3. Setup your boot sequence as required.
4. Enable TPM protectors.
$ manage-bde -protectors -add C: -TPM
As I cannot use the above mentioned work around with my Windows version, the solution that I am actually using now is as follows:
I know it is still painful, but that is the best I can achieve given the circumstances. But I am happy that at least I know what sort of sh** is hitting the fan.
Hope you enjoyed reading. If you reached until here, then may the force be with you mate ??. You got kahunas, peace out.
Leading AppSec at Revolut
2 年Windows issues are always fun, recently I spent a week to fix sound drivers on my laptop which were updated and started giving blue screen death every single time system boot ????♂?