Exploring nanoGPT!
So spured on by Richard Jones 's fabulous posts, I've been experimenting with nanoGPT, getting it running on my Windows 11 machine wasn't too traumatic and before I knew it - thanks to the fabulous instructions that karpathy (https://github.com/karpathy/nanoGPT) provides - I was training data, and watching my GPU go for it!
The initial text you get back was.. interesting but to be fair, it only ran for a few minutes!
So spurred on I decided to try the second step in which was to try and leverage a GPT-2 model - this is where it started to get interesting, the download suceeded just fine - all 12GB of it! - and then it kicked off the tokenization process - something I hadn't appreciated is CPU heavy - I confess I thought the GPU would be used for all this so useful learning for me :)
This is where things started to go a bit awry! Firstly, it wasn't really pushing my machine at all, this is where I realised it was only using one of my CPU cores! So a quick code tweak later and it's using all my cores and maxing the CPU :)
The script was actually throwing an error whilst it was running, but it did look like it was working just fine and the error seemed to imply that it was not important, so I figured it would be fine just to let it run - not knowing how long these things should take I left it for several days, pausing the process before putting my machine to sleep at night - but nothing!
So, I then start poking around on internet to try and find others that have hit this error - which I did, yay! A few more code changes and I see what it was actually meant to do!
I then discover the second interesting thing, cache locations!! ?? my primary OS SSD ran out of space during this stage! So queue some more investigations to find out how to set the huggingface cache location globally, I then ran through the steps again and you can see the HDD took an absoloute beating at this point, now I confess I was doing this on my large slow archive spinning disks, but still!
领英推荐
And finally, it completes - after 5 1/2 (ish) hours, but for sure better than the 32 hours (ish) that it was running for previously!
Yay! Now I can start finetuning, again you can see this is a GPU heavy activity:
Again this is a GPU heavy task, and interestingly, I saw the GPU memory being hit first before the GPU itself, obviously having 24GB GPU RAM here is a great benefit. This completes relitively quickly (minutes not hours):
What I get from this is fifteen pages of text! The text quality is far better than the short excerpt that I shared at the beginning of this article, I won't share it all here, but instead just a few snippets.
Very cool! I really enjoyed this first forray into learning about this stuff, it's been a really useful exercise to get me thinking outside of my usual role, has demonstrated that I can troubleshoot code and make stuff work (I'm ostensibly an infastructure chap at heart) and given me the taste to try other things - so more to come!
IBMer, Microsoft Leader. Pushing to make the world a better place, powered by the Microsoft cloud, IBM and our clients.
1 年Very cool. Although looks like the GPU wasn't so much ??