What does ChatGPT look like? I made an MRI image of a GPT Brain

What does ChatGPT look like? I made an MRI image of a GPT Brain

Sorry for the click bait, and in return I will get right to it. Here is the minimized? image I created of a GPT-2 large language model (here is a link to the original large size image). If you are interested to read more about how it was created read on:

No alt text provided for this image
GPT-2 Visualization


First thing first, I want to put it out in the open, I know very little about the inner workings of Large Language Models (LLMs) such as ChatGPT. I feel like a kid poking around, asking basic questions about this new magical toy I got to play with. My basic understanding may or may not be accurate and in any case I’m just scratching the surface.?

Let me share some basic terms I’m going to use here, so we will all be aligned:

  • ChatGPT: Well this is the most known application that we all use. It is an artificial intelligence chatbot developed by OpenAI. It is built on top of OpenAI's GPT-3.5 and GPT-4 families of large language models and has been fine-tuned using both supervised and reinforcement learning techniques
  • GPT: stands for Generative Pre-trained Transformer, which is a language prediction model that uses deep learning to produce human-like text.?
  • Large Language Model (LLM): are deep learning algorithms that can process and understand natural language. These models consist of neural networks with numerous parameters, trained on vast quantities of unlabelled text using machine learning techniques. LLMs can recognize, summarize, translate, predict, and generate text, and are increasingly being used in a variety of applications, including language translation, content generation, and chatbots.
  • Matrix: In mathematics, a matrix is a rectangular array or table of numbers, symbols, or expressions arranged in rows and columns.

What’s the difference between GPT and LLM? GPT is a specific type of transformer-based language model that is pre-trained using unsupervised learning, while LLM is a more general term that encompasses a range of language models that can generate or predict text.

So like with anything magical, I wanted to better understand how the magic behind ChatGPT works. What better way than to illustrate it, or run an X-ray/MRI on the brain behind GPT. So I wrote a short script that visualizes the numbers behind a Large Language Mode. A Large Language Model (LLM) like ChatGPT is basically built out of a huge matrix of numbers. Something that looks like so:


[ [ 0.8, 0.2, 0.1 ],

??[ 0.3, 0.9, 0.5 ],

??[ 0.6, 0.4, 1.0 ] ]


In this example, the matrix has 3 rows and 3 columns. In reality the size of the matrix is much bigger and may contain billions of numbers. Each number is called a weight, and you can think of it like a neuron in our own brain. So the LLM matrix is similar to our brain where each neuron is connected to each other and can interact with the nearby neurons.??


I wanted to draw a picture of a GPT matrix. Unfortunately the actual weights of GPT3 are the secret sauce of what makes OpenAI ChatGPT so great and they didn’t release it to the public. Fortunately, I can use the Hugging Face Transformers library, which offers pre-trained models and makes the GPT-2 model freely available.?

The script is made out of 5 main steps:

  1. First step is to load a pre-trained GPT-2 model (which is basically a matrix of numbers)
  2. The second step is to loop over the matrix and to load the weights into a list of weights. GPT-2 matrix is made out of 768 lists of arrays of weights. Each array (i.e row of number is made out of 2304 weights)
  3. Since I want to visualize the weight's value, I’m going to get the absolute value of a weight, i.e if a weight value is negative, I’m going to view it as positive.?
  4. To give the image more color depth, I’m multiplying the weights values (which are small float numbers that look something like 0.1354443) by 1024 to make them real integer numbers.
  5. The final step is to save the numbers I created into a BMP image file (I use BMP not to lose visual information)


There is the actual Python script in case you are curious:

import numpy as n
from transformers import GPT2LMHeadModel
import imageio

# Load the GPT-2 model
model = GPT2LMHeadModel.from_pretrained("gpt2")

all_weights = []
for i in range(len(model.transformer.h)):
    # Access the weight matrix of a specific layer, for example, the first self-attention layer
    weight_matrix = model.transformer.h[i].attn.c_attn.weight.detach().numpy()
    all_weights.append(weight_matrix)


all_weights = np.concatenate(all_weights)
normalized_matrix = np.abs(all_weights)
image_matrix = (normalized_matrix * 1024).astype(np.uint8)
imageio.imwrite("weight_matrix_visualization.bmp", image_matrix, format="bmp")
p        

Now, let's drive deeper into what we see in the image. As mentioned, the matrix is made out of 9,216 rows (12 blocks of layers with 768 rows in each block). That’s why the height of the image is 9,216 pixels. Each row is made out of 2,304 weights (numbers), and that’s why the width of the image is 2,304 pixels.?

A Large Language Model such as GPT, is made up of four basic parts where each part has a different responsibility:

First part is called the Input Embedding Layer and it is responsible for converting the raw text input into a dense vector representation (i.e the weight numbers we discussed before) that the model can work with. The goal of this step is to capture the semantic and syntactic relationships between words in the language.

Second part is called the Self-Attention Layer which is responsible for learning contextual relationships between different tokens (you can view the token as a word or part of a word) in the input vector (again the weight numbers we mentioned before). The result is a calculation of the importance of each token in the sequence.

Third part is called Feedforward Layer which is more abstract. The purpose of the feedforward layer is to provide the model with a more abstract and compressed representation of the input layer. It is like a box that takes in some numbers, does some math with them, and gives out some new numbers. It's like a special machine that can transform the input numbers into output numbers using a set of rules that it has learned.

Think of it like a game where you have to match shapes. The feedforward layer is like the game board, and the shapes are the input numbers. The rules of the game tell you how to match the shapes to get a new shape as output. The feedforward layer works the same way, but with numbers instead of shapes.

So when you give the feedforward layer some input numbers, it applies its rules to them and gives you some new output numbers. This process happens very quickly, and it can be repeated many times to get more and more complex outputs

The last and final layer is called Layer Normalization. This layer normalizes the output of the Feedforward Layer to improve the stability and speed of the training process.

These layers are stacked on top of each other to form a deep neural network, with the output of one layer serving as the input to the next. The overall purpose of the LLM is to learn a rich, high-dimensional representation of natural language text that can be used for a wide range of downstream NLP tasks, such as text classification, language translation, and question-answering.

No alt text provided for this image
GPT-2 Visualization with block explanation

I still have many unanswered questions about the image itself, what does the color change in each row means? Why does the gradient in color decrease as we go down the layers? Will we see the same pattern with other LLMs? If so, can an image of an LLM be an indication of “correct” learning pattern, and like with an MRI, if there is a problem with the LLM can we detect areas in its brain that malfunction??

I hope you enjoyed reading this short article and it made some sense. I know it was made out of new and somewhat complex concepts, but I tried to make it as simple as possible without making it 20 pages long. I hope to learn more and have more answers to the many questions I have, unfortunately I find myself with more questions the more I read about this fascinating topic.?

要查看或添加评论,请登录

社区洞察