登录查看更多内容

Exploring nanoGPT!

David Rowley

IBMer | CTO | Microsoft Technologies

发布日期: 2023年6月12日

So spured on by Richard Jones 's fabulous posts, I've been experimenting with nanoGPT, getting it running on my Windows 11 machine wasn't too traumatic and before I knew it - thanks to the fabulous instructions that karpathy (https://github.com/karpathy/nanoGPT) provides - I was training data, and watching my GPU go for it!

The initial text you get back was.. interesting but to be fair, it only ran for a few minutes!

So spurred on I decided to try the second step in which was to try and leverage a GPT-2 model - this is where it started to get interesting, the download suceeded just fine - all 12GB of it! - and then it kicked off the tokenization process - something I hadn't appreciated is CPU heavy - I confess I thought the GPU would be used for all this so useful learning for me :)

This is where things started to go a bit awry! Firstly, it wasn't really pushing my machine at all, this is where I realised it was only using one of my CPU cores! So a quick code tweak later and it's using all my cores and maxing the CPU :)

The script was actually throwing an error whilst it was running, but it did look like it was working just fine and the error seemed to imply that it was not important, so I figured it would be fine just to let it run - not knowing how long these things should take I left it for several days, pausing the process before putting my machine to sleep at night - but nothing!

So, I then start poking around on internet to try and find others that have hit this error - which I did, yay! A few more code changes and I see what it was actually meant to do!

I then discover the second interesting thing, cache locations!! ?? my primary OS SSD ran out of space during this stage! So queue some more investigations to find out how to set the huggingface cache location globally, I then ran through the steps again and you can see the HDD took an absoloute beating at this point, now I confess I was doing this on my large slow archive spinning disks, but still!

领英推荐

Troubleshooting the Most Common CUDA Installation…

Bojan Tunguz, Ph.D. 1 个月前

In Network Acceleration for AI/ML Workloads

Sharada Yeluri 1 年前

A Prediction: Highly Optimized Computing Platforms

Roger Grimes 6 个月前

And finally, it completes - after 5 1/2 (ish) hours, but for sure better than the 32 hours (ish) that it was running for previously!

Yay! Now I can start finetuning, again you can see this is a GPU heavy activity:

Again this is a GPU heavy task, and interestingly, I saw the GPU memory being hit first before the GPU itself, obviously having 24GB GPU RAM here is a great benefit. This completes relitively quickly (minutes not hours):

What I get from this is fifteen pages of text! The text quality is far better than the short excerpt that I shared at the beginning of this article, I won't share it all here, but instead just a few snippets.

Very cool! I really enjoyed this first forray into learning about this stuff, it's been a really useful exercise to get me thinking outside of my usual role, has demonstrated that I can troubleshoot code and make stuff work (I'm ostensibly an infastructure chap at heart) and given me the taste to try other things - so more to come!

Tim Callaghan

IBMer, Microsoft Leader. Pushing to make the world a better place, powered by the Microsoft cloud, IBM and our clients.

1 年

Very cool. Although looks like the GPU wasn't so much ??

查看更多评论

要查看或添加评论，请登录

David Rowley的更多文章

From curiosity to capability: Why thought leaders must ground ideas in real experience

2025年3月20日

From curiosity to capability: Why thought leaders must ground ideas in real experience

I see so many thought leadership posts, many fascinating but also unfortunately, many clearly not being grounded in…
Microsoft AI Tour - London 2025

2025年3月6日

Microsoft AI Tour - London 2025

As you may have seen yesterday, I was fortunate enough to be able to attend the AI Tour at the ExCel in London…

2 条评论
CogX AI Summit London

2024年10月14日

CogX AI Summit London

A bit late posting due to a busy week last week, but nevertheless, wanted to share this. I was fortunate enough to…
Why Sometimes You Should Unask the Question: The Zen of Business Strategy

2024年9月23日

Why Sometimes You Should Unask the Question: The Zen of Business Strategy

In the world of business, we're often trained to seek answers—better solutions, faster processes, bigger growth. But…

3 条评论
Rooftop reflections - from sea to space...

2024年8月17日

Rooftop reflections - from sea to space...

As I sit atop a rooftop in Tenerife, huddled under an umbrella for shade, I find myself having a moment of solitude…

3 条评论
Reflecting on two years at IBM

2024年7月11日

Reflecting on two years at IBM

Where has the time gone?! Two years at IBM has flown by! Being at the heart of the Microsoft transformation here at IBM…

7 条评论
Why Retrieval-augmented generation (RAG)?

2024年7月5日

Why Retrieval-augmented generation (RAG)?

TLDR; I spent far too much time trying to create a fine-tuned "J.D.

3 条评论
Unleashing Potential: How a Simple Email to NASA Changed Everything

2023年8月7日

Unleashing Potential: How a Simple Email to NASA Changed Everything

So following my recent post about Windows Server Being 30 years old and subsequent discussion on what would be your…

7 条评论
Going it alone!

2021年4月6日

Going it alone!

A little while ago whilst looking for new things to help my learning/growth I stumbled across a free course called…
Work & life travels of an introvert - Article #2

2020年2月26日

Work & life travels of an introvert - Article #2

It's half term and my family and I are heading to the depths of West Wales in the UK, to a tiny city called - St…

6 条评论

See all articles

Exploring nanoGPT!

David Rowley

IBMer | CTO | Microsoft Technologies

领英推荐

David Rowley的更多文章

社区洞察

其他会员也浏览了

GPU for Daily Development: A Practical Guide

Unleashing the Power of 1-Bit LLMs with bitnet.cpp: Accelerating Inference and Efficiency

Using a Local LLM for AutoComplete

Back to Bytes: A Time Capsule of Computing and A.I. in 1984 (part i of iii)

"If I Just Had…" — A Journey Through the Platforms That Made Me

From NAND to Tetris…

?? Optimizing JVM with G1 Garbage Collector (G1GC)

Setting Up TensorFlow with GPU Support on Ubuntu: A Comprehensive Guide to Fixing Common Errors

Optimize Your ML and Data Workloads

Top 20 Linux Commands for every Machine Learning Engineer

领英推荐

David Rowley的更多文章

From curiosity to capability: Why thought leaders must ground ideas in real experience

Microsoft AI Tour - London 2025

CogX AI Summit London

Why Sometimes You Should Unask the Question: The Zen of Business Strategy

Rooftop reflections - from sea to space...

Reflecting on two years at IBM

Why Retrieval-augmented generation (RAG)?

Unleashing Potential: How a Simple Email to NASA Changed Everything

Going it alone!

Work & life travels of an introvert - Article #2

社区洞察

其他会员也浏览了

GPU for Daily Development: A Practical Guide

Unleashing the Power of 1-Bit LLMs with bitnet.cpp: Accelerating Inference and Efficiency

Using a Local LLM for AutoComplete

Back to Bytes: A Time Capsule of Computing and A.I. in 1984 (part i of iii)

"If I Just Had…" — A Journey Through the Platforms That Made Me

From NAND to Tetris…

?? Optimizing JVM with G1 Garbage Collector (G1GC)

Setting Up TensorFlow with GPU Support on Ubuntu: A Comprehensive Guide to Fixing Common Errors

Optimize Your ML and Data Workloads

Top 20 Linux Commands for every Machine Learning Engineer