登录查看更多内容

Deep Dive: Building GPT from scratch - part 5

Miko Pawlikowski ???

Follow for coding, bootstrapped startups & breakthroughs in tech. Founder, Engineer, Speaker.

发布日期: 2024年3月8日

+ 关注

learning from Andrej Karpathy

Hello and welcome back to the series on Starter AI. I’m Miko, this time writing from Tokyo.

Today, we’re picking up where we left last week, and we’re working on stabilising the neural network we implemented last time, using batch normalization, and learning helpful visualizations in the process.

The roadmap

The goal of this series is to implement a GPT from scratch, and to actually understand everything needed to do that. We’re following Andrej’s Zero To Hero videos. If you missed a previous part, catch up here:

Neural Networks & Backpropagation part 1 - 2024/02/09
Neural Networks & Backpropagation part 2 - 2024/02/16
Generative language model - bigrams - 2024/02/23
Generative language model - MLP - 2024/03/01
Today: Generative language model - activations & gradients

To follow along, subscribe to the newsletter at starterai.dev. You can also follow me on LinkedIn.

Generative language model - activations & gradients

Today’s lecture is called “Building makemore Part 3: Activations & Gradients, BatchNorm”, and it builds from where we left last week.

Last time we covered building a Multilayer perceptron (MLP), following the Bengio et al. 2003 MLP language model paper. Before we move to more sophisticated networks, we’re spending today’s lecture on building a deeper understanding of activations and gradients, how to develop an intuition on what numbers make sense and what don’t, and how to visualise them.

The lecture is in two parts. The first part covers initialisation and the Batch normalization paper. The second part turns the code to look like PyTorch’s equivalent, and teaches us how to visualise the different ratios using basic histograms, to better understand how well the training is going.

Only a few new concepts in this lecture.

Context

Fan-in (and fan-out) are the number of inputs (or outputs, respectively).

Kaiming init paper - a paper discussing the behaviour of various squishing functions, both in forward and backpropagation passes. It’s implemented in torch.nn.init.kaiming_normal_and it’s considered one of the most popular ways of initialising neural networks.

Batch normalization paper. A technique allowing for normalising ranges of data in a neural network to avoid calculus pitfalls, and stabilise the learning of the whole network. It takes out some heuristics, and replaces them with formulas. The lecture covers this in detail.

Also, the magical 5/3 according to @leopetrini comes from the average value of tanh^2(x) where x is Gaussian:

Video + timestamps

Part 1

00:04:19 Fixing the initial loss, removing the hockey stick appearance of the graph

00:12:59 Tanh quirks & how to work around them

00:27:53 Initialising the network - “Kaiming init” paper

领英推荐

What are Neural Networks, or Why the Future of AI…

Constantine Shulyak 8 个月前

What’s a convolutional neural network and how is it…

Algolia 1 个月前

?? A New Direction for Neural Networks

Pascal Biese 10 个月前

00:40:40 Batch normalization paper.

01:04:50 Real example: resnet50 walkthrough

Part 2

01:18:35 PyTorch-ifying the code

01:26:51 Viz #1: forward pass activations statistics

01:30:54 Viz #2: backward pass gradient statistics

01:36:15 Viz #3: parameter activation and gradient statistics

01:39:55 Viz #4: update:data ratio over time

01:46:04 bringing back batchnorm, looking at the visualizations

01:51:34 Summary

Summary

I really liked this lecture - Andrej took a quick detour on our quest of making makemore, to remove a little bit of the fog around the initialisation, turning the whole process from very artisanal to more engineering-based.

We covered the Kaiming init paper and well as the Batch normalization paper, which make for far more predictable outcomes.

And plotting the different ratios and distributions to confirm things look reasonable makes me feel much better about the whole thing :)

What’s next

Next week, we’re following Andrej into another rabbit hole - that of backpropagation.

As always, subscribe to this newsletter at starterai.dev to get the next parts in your mailbox!

Share with a friend

If you like this series, please forward to a friend!

Feedback

How did you like it? Was it easy to follow? What should I change for the next time?

Please reach out on LinkedIn and let me know!

Starter AI

2,444 位关注者

Dr. Ashley Dash

1 年

Excited to dive deeper into the world of activations and gradients! ????

2 次回应

要查看或添加评论，请登录

Miko Pawlikowski ???的更多文章

Master your brain, with Eric Collett - HS#38

2025年2月24日

Master your brain, with Eric Collett - HS#38

In the latest episode of the HockeyStick Podcast, I had the pleasure of welcoming Eric Collett , CEO of A Mind for All…

122 条评论
Why every CTO must play Sekiro - HS#37 solo

2025年2月16日

Why every CTO must play Sekiro - HS#37 solo

Do you remember that amazing feeling of absolutely NAILING something REALLY hard? Scientifically speaking, that’s…

100 条评论
Why you will fail without Chaos Engineering, with Kolton Andrus - HS#25

2024年9月30日

Why you will fail without Chaos Engineering, with Kolton Andrus - HS#25

Sign up for the weekly newsletter: https://hockeystick.show/welcome Introduction Welcome to episode 25 of the…

15 条评论
Why AI bubble will burst, with Emmanuel Maggiori - HS#24

2024年9月23日

Why AI bubble will burst, with Emmanuel Maggiori - HS#24

In episode 24, Miko Pawlikowski ??? sits down with Emmanuel Maggiori, an AI consultant, writer, and speaker who is on a…

21 条评论
OpenAI presents "o1," Introducing: World Labs, Google announces DataGemma

2024年9月19日

OpenAI presents "o1," Introducing: World Labs, Google announces DataGemma

Here’s what you’ve missed in AI this week. ?? OpenAI introduces o1 (3 min) After a lot of speculation, what we once…

4 条评论
The real benefits of AI agents, with Dave Brewster & Ravi Ramachandran - HS#23

2024年9月16日

The real benefits of AI agents, with Dave Brewster & Ravi Ramachandran - HS#23

In the latest episode of HockeyStick Show, I had the pleasure of speaking with Ravi Ramachandran and Dave Brewster…
Mistral’s Pixtral 12B, More on “Strawberry”, Apple’s visual AI approach

2024年9月12日

Mistral’s Pixtral 12B, More on “Strawberry”, Apple’s visual AI approach

Here’s what you’ve missed in AI this week. ??? Mistral introduces Pixtral 12B (1 min) Mistral has made its entrance…
Life-changing performance upgrade using Wim Hof method, with Ravi Modha - HS#22

2024年9月9日

Life-changing performance upgrade using Wim Hof method, with Ravi Modha - HS#22

Dive into the world of cold exposure and breathwork with Ravi Modha! In episode 22 of HockeyStick Show, Ravi shares his…

2 条评论
GPTNext is coming, Alibaba’s Qwen2-VL, Cohere updates Command R

2024年9月5日

GPTNext is coming, Alibaba’s Qwen2-VL, Cohere updates Command R

Here’s what you’ve missed in AI this week. ?? OpenAI Japan announces what’s “Next” (1 min) We're always on the lookout…
Be a Lead Engineer people admire, with Shelly Benhoff - HS#21

2024年9月2日

Be a Lead Engineer people admire, with Shelly Benhoff - HS#21

If we had to describe the latest episode of HockeyStick Show with one word, it would be "inspiring." And if you're…

See all articles

Deep Dive: Building GPT from scratch - part 5

Miko Pawlikowski ???

Follow for coding, bootstrapped startups & breakthroughs in tech. Founder, Engineer, Speaker.

learning from Andrej Karpathy

The roadmap

Generative language model - activations & gradients

Context

Video + timestamps

Part 1

领英推荐

Part 2

Summary

What’s next

Share with a friend

Feedback

Starter AI

2,444 位关注者

Miko Pawlikowski ???的更多文章

社区洞察

其他会员也浏览了

Artificial Neural Networks and their applications in Computer Vision, NLP and Robotics

Week 8: Deep Dive into Deep Learning and Neural Networks

Inside Neural Networks: The Powerhouse of AI Breakthroughs

Evolution of Neural Network

Deep Dive: Building GPT from scratch - part 6

Deep Dive: Building GPT from scratch - part 4

Top 5 Types of Neural Networks in Deep Learning

Long Short-Term Memory explained

TitanML: Shaping the Future of Neural Network Compression

Part 3: How machines remember

learning from Andrej Karpathy

The roadmap

Generative language model - activations & gradients

Context

Video + timestamps

Part 1

领英推荐

Part 2

Summary

What’s next

Share with a friend

Feedback

Starter AI

2,444 位关注者

Miko Pawlikowski ???的更多文章

Master your brain, with Eric Collett - HS#38

Why every CTO must play Sekiro - HS#37 solo

Why you will fail without Chaos Engineering, with Kolton Andrus - HS#25

Why AI bubble will burst, with Emmanuel Maggiori - HS#24

OpenAI presents "o1," Introducing: World Labs, Google announces DataGemma

The real benefits of AI agents, with Dave Brewster & Ravi Ramachandran - HS#23

Mistral’s Pixtral 12B, More on “Strawberry”, Apple’s visual AI approach

Life-changing performance upgrade using Wim Hof method, with Ravi Modha - HS#22

GPTNext is coming, Alibaba’s Qwen2-VL, Cohere updates Command R

Be a Lead Engineer people admire, with Shelly Benhoff - HS#21

社区洞察

其他会员也浏览了

Artificial Neural Networks and their applications in Computer Vision, NLP and Robotics

Week 8: Deep Dive into Deep Learning and Neural Networks

Inside Neural Networks: The Powerhouse of AI Breakthroughs

Evolution of Neural Network

Deep Dive: Building GPT from scratch - part 6

Deep Dive: Building GPT from scratch - part 4

Top 5 Types of Neural Networks in Deep Learning

Long Short-Term Memory explained

TitanML: Shaping the Future of Neural Network Compression

Part 3: How machines remember