What the GRUB ! ??
Photo by Alexandre Debiève on Unsplash

What the GRUB ! ??

Preamble ??

Recently I ran into a flurry of issue while setting up my gaming + development rig. The idea was to connect multiple GPUs for deep learning work loads, setup my favourite linux development environement, grab a beer and get going.

But... You know, the gods of technology love to have their way with you. My patience and resolve was tested and tried like biblical character Job. ??

Unfortunately, I couldn't get rid of Windows altogether as some of the games I play do not work in Steam linux environment. I learnt that GPU pass through is next to impossible in a virtualized environment. I tried VMware and VirtualBox and learned that the only way to achieve a pass through with your sanity intact is via commercial tools like VMware vSphere.

The setup ??

So, I decided to setup a dual boot system, having Windows 11 on one SSD and Ubuntu 20.04 on another SSD. Everything went smoothly, just as expected. But tensorflow wouldn't pickup the GPU. It turned out that Ubuntu recommended nouveau drivers do not work. You need to install original nvidia drivers, CUDA libraries and nvidia machine learning library. If you are interested in concrete steps, here they are:

  1. You could search for nvidia drivers using apt and select the latest driver. But I would recommend downloading driver installation script directly from nvidia official website.

$ sudo apt-search nvidia-driver        

2. Get the CUDA pin for Ubuntu 20.04.

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin        

3. Move the pin to apt preferences location.

$ sudo?mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600        

4. Get authentication key for the CUDA apt package.

$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub        

5. Add the apt repository & update.

$?sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"

$ sudo apt update        

6. Get the nvidia machine learning library and install it.

$ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb

$ sudo apt install ./nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb        

7. Install CUDA.

$ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/libnccl2_2.8.3-1+cuda11.2_amd64.deb


$ sudo apt install ./libnccl2_2.8.3-1+cuda11.2_amd64.deb


$ sudo apt update?


$?sudo apt install cuda-11-2 libcudnn8=8.1.1.33-1+cuda11.2 libcudnn8-dev=8.1.1.33-1+cuda11.2?


$?sudo apt install libnvinfer8=8.0.0-1+cuda11.0 libnvinfer-dev=8.0.0-1+cuda11.0 libnvinfer-plugin8=8.0.0? -1+cuda11.0??        

8. Setup path variables for CUDA.

$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"


$ export CUDA_HOME=/usr/local/cuda??        

9. In case you are having issues with secure boot system and nvidia driver is not loading up, you would need to remove the dkms binding for the?driver and reinstall it.

$ sudo dkms remove nvidia/495.44 --all


$ sudo dkms install --force nvidia/495.44 -k $(uname -r)?


$ sudo update-initramfs -u

?
$ sync


$ reboot

? ? ? ? 
# After reboot, enroll the MOK        

The pain ??

For relatively newer systems that come loaded with TPM 2.0, your newly installed nvidia drivers wouldn't automagically load. Because nvidia kernel modules has a kernel interface layer that must be compiled specifically for a certain kernel. I learnt that this is a typical behaviour with secure boot system backed by TPM. You would need to explicitely remove dynamic kernel module support (DKMS) binding for your nvidia driver and rebind it. After a reboot, you will be asked to enroll a machine owner key (MOK) that will be generated in the TPM. MOKs enable signing and verification of 3rd part deployed modules and custom built kernels. These modules are then securely loaded by a shim layer between UEFI Secure Boot and GRUB or the kernel.

Yeah yeah, I know. Lot of big words in the previous paragraph. All you need to take away, is that the sh** is getting real mate. :)

Once that issue was out of the way, I could eventually start hacking. Played around for a while and then decided to check something up on Windows side. Lo and behold, right after GRUB screen where I chose the Windows boot manager, an ominous blue screen appeared. Actually, the first time it didn't look ominous. Hey, it wasn't the ususal blue screen of death that Windows is famous for. I kind of expected that. After enrolling a new MOK for the graphic card driver, I would expect the Windows to ask for the bitlocker recovery key. I diligently or rather slavishly entered the key and got logged in. I thought that was the end of it. But no....no....no. That was the beginning of a vicious cycle of me scolding Microsoft and trying various work arounds and then going back to scolding. Every single time I would turn on my system and if I would dare try logging into Windows, this flippin' screen would come up, asking for bitlocker recovery key. It was annoying as hell !

In Microsoft's defence though, once I realized what was the actual cause, I felt sorry for the poor folks at Microsoft. I decided to write this article up so that folks at Microsoft do not receive the abuse in-absentia that they do not deserve.

Okay, enough of ranting and resentment. Let's pin down what was going on and how I approached the problem to find a solution.

Windows boot manager do not trust GRUB. There it is, I have said it. Fair enough or is it ? For the uninitiated, GRUB stands for Grand Unified Bootloader. A bootloader is the first software that runs when you turn on your computer. It loads your OS and then OS takes over. There are many bootloaders available for various linux distros and GRUB is the most popular. In EFI capable systems, the firmware reads the EFI system partition (ESP) to look for boot information.

Whenever the UEFI secure boot system loaded GRUB and I would choose Windows boot manager from there, it would righfully complain and ask for bitlocker recovery key as it would suspect a malicious break in.

Simplest solution would have been to disable encryption of the drives and then disable bitlocker. But I didn't want to do it. To be brutally honest, I actually tried to do that a few times just for the sake of the experiment, but it is Windows, remember ? I tried GUI and command prompt both, but every single time the OS would hang. I heard some reddit chatter that would advise to forget about manage-bde commands (commands to control bitlocker) working on Windows 11 Home. I gave up after a few attempts, anyway I didn't want to do that in the first place. Or, grapes are sour.

Another (silly) option would have been to turn off secure boot. STOP ! don't do that. Windows 11 doesn't work without TPM2.0 and secure boot support. You would permanently screw up your Windows installation, if you tried to do that. Windwos 10 users could perhaps still do that.

After fiddling with EFI system partition, I realized that the GRUB was installed on my other SSD upon which I had installed the linux OS. I thought if I could install GRUB on the SSD where Windows boot manager and Windows OS existed, I could persuade Windows boot manager to trust GRUB. So I did exactly that. Again, "Do not try this at home". This is adult stuff and you may require adult superviosion ;). Nah, just kidding. Go ahead and play around GRUB, it's good for health.

Disclaimer: It didn't resolve the issue. However, it did make sense originally. Loading GRUB from a trusted location should have made my life easy, but it didn't.

Anyway, I will shoot in the lines below what I did. In my case, both drives had ESP partitions. So I didn't need to make one. But if you don't have it, you can make one. Here is a step-by-step guide:

  • Check your drives. Volumes marked as vfat file systems are the ESPs.

$ lsblk -f        

  • Create at least a 256 MiB disk partition using a GPT label on the drive where you want to install GRUB.


$ gdisk /dev/sda         

  • Let's assume that the volume label of the created partition is /dev/sda1, format the partition as FAT32.

$ mkfs.vfat /dev/sda1        

  • Create the /boot/efi directory as a mount point for the new partition.

$ mkdir /boot/efi        

  • Mount the partition to the /boot/efi mount point. If you already had ESP, /boot/efi would already be mounted.

$ mount /dev/sda1 /boot/efi        

  • Install GRUB to the mounted partition. Ubuntu 20.04 already comes with all utiltites you need except the shim loader. You can install that using apt.

# install missing dependency
$ sudo apt install shim

# create GRUB configuration file. GRUB will look for this config file.
$ sudo grub-mkconfig -o /boot/efi/EFI/ubuntu/grub.cfg

# install GRUB
$ grub-install /dev/sda        

The solace ??

Once I miserably failed to persuade Windows boot manager to trust GRUB. I started to observe what TPM 2.0 was doing in case when I would boot directly from Windows boot manager or through GRUB.

I noticed a pattern. The platform validation profile kept on changing. This profile consists of a set of indices ranging from 0-23. This profile is being read from Platform Configuration Register (PCR) of the TPM 2.0. Each PCR index correspond to a service that runs when OS starts up. Each time the computer starts, the TPM will check that the specified services in the platform validation profile have not changed. If any of these services change while bitlocker drive encryption (BDE) protection is on, the TPM will not release the encryption key to unlock the disk volume and the computer will enter into recovery mode. For further reading on the topic, consult Microsft documentation regarding PCR profiles.

Even when the GRUB is being executed from tusted location, it still ends up changing the PCR profile because the expectation is that it is going to be Windows boot manager instead.

I had finally understood the real cause behind the issue. PCR7 is the profile contains the signature of boot sequence entry. I could see that if somehow I can change the value stored at PCR7, I should be able to theoratically coerce the system to trust GRUB. However, here I hit a brick wall. 'manage-bde' command could only provide me with profile information on Windows 11 Home. It wouldn't allow me to modify the protectors. This method to disable prtoectors and reenable them with the boot setup required would supposedly work on Windows Professional. But I haven't tried it. Here are the steps (only step 1 works for me):

  1. Get PCR profile and recovery key. Back it up for safe keeping.

$ manage-bde -protectors -get C:        

2. Disable TPM protectors.

$ manage-bde -protectors -delete C: -type TPM        

3. Setup your boot sequence as required.

4. Enable TPM protectors.

$ manage-bde -protectors -add C: -TPM        

As I cannot use the above mentioned work around with my Windows version, the solution that I am actually using now is as follows:

  • Setup default boot priority for Windows boot manager. This way, PCR profile loaded will be correct. Downside is that you will always boot into Windows by default.
  • Hit F9 to open boot menu (you may have different key based upon your UEFI firmware settings) and select to boot from GRUB and then select Ubuntu.

I know it is still painful, but that is the best I can achieve given the circumstances. But I am happy that at least I know what sort of sh** is hitting the fan.

Hope you enjoyed reading. If you reached until here, then may the force be with you mate ??. You got kahunas, peace out.



Arsalan G.

Leading AppSec at Revolut

2 年

Windows issues are always fun, recently I spent a week to fix sound drivers on my laptop which were updated and started giving blue screen death every single time system boot ????♂?

要查看或添加评论,请登录

Hasnain Virk的更多文章

  • Transferring Route53 Domain & Hosted Zone between two AWS Accounts

    Transferring Route53 Domain & Hosted Zone between two AWS Accounts

    Recently, I came across an issue where I needed to transfer my route53 registered domain & hosted zone from one AWS…

    11 条评论
  • Secret Sauce: Keeping a business innovative & competitive

    Secret Sauce: Keeping a business innovative & competitive

    In May 2021, Warren Buffet famously hurled a list of world’s top 20 companies by stock market value and asked the…

  • Dreams do come true ??

    Dreams do come true ??

    Over the years, I designed & developed end to end architectures for all sorts of IoT solutions. I actively contributed…

    16 条评论
  • To ?? infinity and beyond

    To ?? infinity and beyond

    Time does not fly, it warps. I know it is scientifically incorrect because only space warps, but anyway you got the…

    1 条评论

社区洞察

其他会员也浏览了