The Most Mindblowing Realization About ChatGPT
Sam Glassenberg
Level Ex CEO | Advancing medicine through videogame technology and design
My ongoing exploration into the inner workings of ChatGPT has brought me to a crucial understanding that many don’t realize:
ChatGPT’s implementation is shockingly…mind-blowingly… simple.?
It joins other such mysterious phenomena of “unimaginable complexity emergent from incredible simplicity” that the human mind struggles to comprehend… like how a short DNA sequence that could fit on a CD encodes a complete human.?
When folks hear stats like “terabytes of training data” and “175 billion parameters” they think “wow - this thing is super complicated!”. That impression is completely wrong. The “engine” that is ChatGPT is so incredibly simple, even I struggle to believe that it produces the output that it does.?
Seeing ChatGPT in-action helps us overcome our innate biases that limit our understanding of how memory works (and how the human brain works in general.)
I’ll explain.
Understanding ChatGPT’s Inner Layout
ChatGPT is surprisingly willing to share the details of her architecture. In my prior “interviews” with her, I’ve collected all the data necessary to calculate her layout and complexity below. Another interview (here) verified a few assumptions that I had made.
To understand ChatGPT’s inner workings follow these 3 steps:
1. Imagine 2 million dots.
Actually, you don’t have to imagine them. It’s the number of pixels in a 2 megapixel photo (your iPhone defaults to 12 megapixels.)
They fit onto a small image. Here are 1 million dots, so imagine two of these images:
2. Rearrange those dots so they’re in 24 long rows, about 85,000 dots wide
3. Now for each dot, connect it with every dot in the next row by virtual wires of varying thickness, like this:?
You will have ~175 billion wires. The thickness of those wires represents 175 billion numbers. Those are the “175 billion parameters”.?
领英推荐
Those dots are all exactly the same - they “perform” incredibly simple math in an artificial neural network, taking the input from the previous layer and multiplying it by the parameters (wire thickness). The idea is to sort of approximate the way a simple biological neuron works: the wires represent the neural connections.??
THAT’S IT.
I’m not kidding. That’s it.?
There is no database. Not a single text file. No documents. No backup copy of Wikipedia. No internet connection. No folder full of manuals. Just 175 billion numbers, laid out like a weaved scarf. Just those numbers. It would fit on a 256GB USB stick with room to spare. That’s it.?
All of the logic, the “intelligence”, the quotes, the humor, the names - all of the product manuals, the bible, the shakespeare… everything that informs ChatGPT’s output is encoded and added (by its learning process) into the layout above. That’s it.?
“But Sam, doesn’t it also have a…?”?
No. No it doesn’t. That’s it.?
What about all of the things that I tell ChatGPT in the chat?
If you aren’t blown away yet that all of ChatGPT is just 175 billion numbers, wait until you hear this: your entire conversation with ChatGPT, no matter what, is represented by 2,048 numbers. No more.?
When you “talk” to ChatGPT, it takes the letters you give it and encodes them into up to 2,048 values, and it puts those values into the front end of the network above. The output of the network is its response. If you enter more than 2,048 letters (like when you paste in a long article or a manual), it grabs the first 2,048 letters, runs it through the network to generate a ‘hidden state vector’ that is 2,048 numbers long, and combines it with the next 2,048 letters and runs it again (this is a slightly simplified explanation of the algorithm but not much).?
That’s why ChatGPT doesn’t learn from your conversation. Every conversation is from scratch. Those 2,048 numbers change during a conversation. The 175 billion? numbers don’t change no matter how much you talk to her. Those numbers only change during a ChatGPT upgrade.?
What this means
This ranks ChatGPT among a category of “unimaginable complexity emergent from incredible simplicity” that we see in certain natural phenomena, and once in a while in computer science. Here are some other examples:
DNA - 4 billion DNA nucleotides is 691 megabytes of data. That fits on a CD. It represents a human. Change enough numbers, you get a tyrannosaurus instead. Or a jellyfish. Or a rutabaga.?
Fractals - Fractals in math and in nature take a simple mathematical equation and repeat it at progressively smaller scales, creating self-similar shapes that are infinitely complex. The simplicity of the equation belies the complexity of the shapes it generates.
Cellular automata?- Cellular automata (examples here) illustrate how a simple set of rules can create a wide range of complex patterns and behaviors. By updating the state of each cell in a grid based on the states of its neighbors, cellular automata can produce intricate and dynamic patterns that can appear to be alive and seemingly intelligent.
Demoscene - An emergent subculture in computer graphics - demoscene is a category of tiny programs written by skilled experts. A program made up of 4 kilobytes (or smaller) generates gorgeous, complex, animated worlds.?
The human brain - and beyond
For years, many of us struggled to believe that human memory could be encoded in neurons and neural connections (“there has to be something more!”)
In my view, seeing the emergent behavior from ChatGPT puts those objections to rest. Seeing ChatGPT in-action helps us overcome this and other innate biases and preconceptions that limit our understanding of how memory works (and how the human brain works in general.)
We'll explore implications for brain research in future articles.
Andreas Robinson
Erik Beall
Global Medical Lead Innovation Scouting - Corporate Division Medicine @ Boehringer Ingelheim International GmbH
1 年Great article, reflections, and insights, Sam Glassenberg! Thanks for sharing ... and you are absolutely right with your assessment, this is #MindBlowing ... also for me! :-)
Digital Substrate Architect with insight from being there (Retired)
1 年Not all that simple. There are all kinds of built-in prior knowledge like syntax parsing to prep the textual inputs including I suspect rudiments of document structure. There are also rule-based output filters to flag sensitive topics like layoff letters. Yes, there are signs of some kind of “woke” processor to protect the company, which frankly is not a bad thing. You are sort of right in that the basic analysis steps (algorithms) are reasonably simple programs in their own right. But you are also furthest off base if you think the result is simple. The outputs of those processing steps are fed forward to other of the hundreds of billions of numeric parameters and then back from them. Computationally this results in nonlinear calculations like fractals use. While each step can be analyzed and understood, the composite and mind boggling complex result cannot. The is no way to say exactly what that massive computation is doing. Certainly not by looking at parametric values in the output. There is no way to say in advance how it will respond with output at prompt time. None whatsoever. No way to know what those calculations will produce. No way to know what among the billions of parameters constitute the equivalent of a rule.
No, the architecture is not that simple. I think you should remove this article to stop misleading people. For those who want to learn about it you can start with this paper: https://arxiv.org/pdf/1706.03762.pdf