What does ChatGPT look like? I made an MRI image of a GPT Brain
Sorry for the click bait, and in return I will get right to it. Here is the minimized? image I created of a GPT-2 large language model (here is a link to the original large size image). If you are interested to read more about how it was created read on:
First thing first, I want to put it out in the open, I know very little about the inner workings of Large Language Models (LLMs) such as ChatGPT. I feel like a kid poking around, asking basic questions about this new magical toy I got to play with. My basic understanding may or may not be accurate and in any case I’m just scratching the surface.?
Let me share some basic terms I’m going to use here, so we will all be aligned:
What’s the difference between GPT and LLM? GPT is a specific type of transformer-based language model that is pre-trained using unsupervised learning, while LLM is a more general term that encompasses a range of language models that can generate or predict text.
So like with anything magical, I wanted to better understand how the magic behind ChatGPT works. What better way than to illustrate it, or run an X-ray/MRI on the brain behind GPT. So I wrote a short script that visualizes the numbers behind a Large Language Mode. A Large Language Model (LLM) like ChatGPT is basically built out of a huge matrix of numbers. Something that looks like so:
[ [ 0.8, 0.2, 0.1 ],
??[ 0.3, 0.9, 0.5 ],
??[ 0.6, 0.4, 1.0 ] ]
In this example, the matrix has 3 rows and 3 columns. In reality the size of the matrix is much bigger and may contain billions of numbers. Each number is called a weight, and you can think of it like a neuron in our own brain. So the LLM matrix is similar to our brain where each neuron is connected to each other and can interact with the nearby neurons.??
I wanted to draw a picture of a GPT matrix. Unfortunately the actual weights of GPT3 are the secret sauce of what makes OpenAI ChatGPT so great and they didn’t release it to the public. Fortunately, I can use the Hugging Face Transformers library, which offers pre-trained models and makes the GPT-2 model freely available.?
The script is made out of 5 main steps:
There is the actual Python script in case you are curious:
import numpy as n
from transformers import GPT2LMHeadModel
import imageio
# Load the GPT-2 model
model = GPT2LMHeadModel.from_pretrained("gpt2")
all_weights = []
for i in range(len(model.transformer.h)):
# Access the weight matrix of a specific layer, for example, the first self-attention layer
weight_matrix = model.transformer.h[i].attn.c_attn.weight.detach().numpy()
all_weights.append(weight_matrix)
all_weights = np.concatenate(all_weights)
normalized_matrix = np.abs(all_weights)
image_matrix = (normalized_matrix * 1024).astype(np.uint8)
imageio.imwrite("weight_matrix_visualization.bmp", image_matrix, format="bmp")
p
Now, let's drive deeper into what we see in the image. As mentioned, the matrix is made out of 9,216 rows (12 blocks of layers with 768 rows in each block). That’s why the height of the image is 9,216 pixels. Each row is made out of 2,304 weights (numbers), and that’s why the width of the image is 2,304 pixels.?
A Large Language Model such as GPT, is made up of four basic parts where each part has a different responsibility:
First part is called the Input Embedding Layer and it is responsible for converting the raw text input into a dense vector representation (i.e the weight numbers we discussed before) that the model can work with. The goal of this step is to capture the semantic and syntactic relationships between words in the language.
Second part is called the Self-Attention Layer which is responsible for learning contextual relationships between different tokens (you can view the token as a word or part of a word) in the input vector (again the weight numbers we mentioned before). The result is a calculation of the importance of each token in the sequence.
Third part is called Feedforward Layer which is more abstract. The purpose of the feedforward layer is to provide the model with a more abstract and compressed representation of the input layer. It is like a box that takes in some numbers, does some math with them, and gives out some new numbers. It's like a special machine that can transform the input numbers into output numbers using a set of rules that it has learned.
Think of it like a game where you have to match shapes. The feedforward layer is like the game board, and the shapes are the input numbers. The rules of the game tell you how to match the shapes to get a new shape as output. The feedforward layer works the same way, but with numbers instead of shapes.
So when you give the feedforward layer some input numbers, it applies its rules to them and gives you some new output numbers. This process happens very quickly, and it can be repeated many times to get more and more complex outputs
The last and final layer is called Layer Normalization. This layer normalizes the output of the Feedforward Layer to improve the stability and speed of the training process.
These layers are stacked on top of each other to form a deep neural network, with the output of one layer serving as the input to the next. The overall purpose of the LLM is to learn a rich, high-dimensional representation of natural language text that can be used for a wide range of downstream NLP tasks, such as text classification, language translation, and question-answering.
I still have many unanswered questions about the image itself, what does the color change in each row means? Why does the gradient in color decrease as we go down the layers? Will we see the same pattern with other LLMs? If so, can an image of an LLM be an indication of “correct” learning pattern, and like with an MRI, if there is a problem with the LLM can we detect areas in its brain that malfunction??
I hope you enjoyed reading this short article and it made some sense. I know it was made out of new and somewhat complex concepts, but I tried to make it as simple as possible without making it 20 pages long. I hope to learn more and have more answers to the many questions I have, unfortunately I find myself with more questions the more I read about this fascinating topic.?