Exploring nanoGPT!
A small robot looking at the words GPT on a rock, created by Bing Image creator

Exploring nanoGPT!

So spured on by Richard Jones 's fabulous posts, I've been experimenting with nanoGPT, getting it running on my Windows 11 machine wasn't too traumatic and before I knew it - thanks to the fabulous instructions that karpathy (https://github.com/karpathy/nanoGPT) provides - I was training data, and watching my GPU go for it!

No alt text provided for this image

The initial text you get back was.. interesting but to be fair, it only ran for a few minutes!

No alt text provided for this image
Inital results from training

So spurred on I decided to try the second step in which was to try and leverage a GPT-2 model - this is where it started to get interesting, the download suceeded just fine - all 12GB of it! - and then it kicked off the tokenization process - something I hadn't appreciated is CPU heavy - I confess I thought the GPU would be used for all this so useful learning for me :)

This is where things started to go a bit awry! Firstly, it wasn't really pushing my machine at all, this is where I realised it was only using one of my CPU cores! So a quick code tweak later and it's using all my cores and maxing the CPU :)

No alt text provided for this image
An image of lots and lot's of python processes running after increasing CPU count

The script was actually throwing an error whilst it was running, but it did look like it was working just fine and the error seemed to imply that it was not important, so I figured it would be fine just to let it run - not knowing how long these things should take I left it for several days, pausing the process before putting my machine to sleep at night - but nothing!

So, I then start poking around on internet to try and find others that have hit this error - which I did, yay! A few more code changes and I see what it was actually meant to do!

No alt text provided for this image

I then discover the second interesting thing, cache locations!! ?? my primary OS SSD ran out of space during this stage! So queue some more investigations to find out how to set the huggingface cache location globally, I then ran through the steps again and you can see the HDD took an absoloute beating at this point, now I confess I was doing this on my large slow archive spinning disks, but still!

No alt text provided for this image

And finally, it completes - after 5 1/2 (ish) hours, but for sure better than the 32 hours (ish) that it was running for previously!

No alt text provided for this image

Yay! Now I can start finetuning, again you can see this is a GPU heavy activity:

No alt text provided for this image

Again this is a GPU heavy task, and interestingly, I saw the GPU memory being hit first before the GPU itself, obviously having 24GB GPU RAM here is a great benefit. This completes relitively quickly (minutes not hours):

No alt text provided for this image
completion of fine-tuning the Shakespear data using GPT-2

What I get from this is fifteen pages of text! The text quality is far better than the short excerpt that I shared at the beginning of this article, I won't share it all here, but instead just a few snippets.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Very cool! I really enjoyed this first forray into learning about this stuff, it's been a really useful exercise to get me thinking outside of my usual role, has demonstrated that I can troubleshoot code and make stuff work (I'm ostensibly an infastructure chap at heart) and given me the taste to try other things - so more to come!

Tim Callaghan

IBMer, Microsoft Leader. Pushing to make the world a better place, powered by the Microsoft cloud, IBM and our clients.

1 年

Very cool. Although looks like the GPU wasn't so much ??

回复

要查看或添加评论,请登录

David Rowley的更多文章

社区洞察

其他会员也浏览了