Demystifying How ChatGPT Works
Lukasz Bialozor
Global Digital Analytics & AI Leader | Driving Digital Data Transformation for Fortune Global 50 | Oxford Business School | MIT Sloan | MBA Essentials at LSE
Introduction
Have you ever wondered how the ChatGPT system, developed by OpenAI, creates text that reads like human composition? If so, strap in as we unravel the magic behind the curtains, simplifying the complexities of this giant machine-learning model into bitesize pieces.
The Art of Making Sense of Text
In the simplest terms, ChatGPT aims to continue the provided text in the most logical way possible. For instance, if we feed the model a snippet like "Dinner was delightful because", ChatGPT will scan its learned knowledge from billions of internet pages and books to predict the next word. Possibilities could range from "of", "the", "it", and so on. Each word is assigned a probability, a measure of its likely occurrence.
For instance:?
Dinner was delightful because?
- of: 51.2%
- the: 33.1%
- we: 8.6%
- it: 7.1%
Spinning the Wheel of Fortune?
After the model generates a list of probable words, the question arises - which one to choose? Should we opt for the highest-ranked one? It's more complex.
For instance:
Technology today is evolving at,
Technology today is evolving at an,
Technology today is evolving at an unprecedented,
Technology today is evolving at an unprecedented pace,
领英推荐
These lines represent adding a new word to the sentence based on the context provided by the previous words. Each word is a decision ChatGPT makes, contributing to shaping the output. Opting for the highest probability word each time would render the generated text too predictable and mundane. Instead, ChatGPT occasionally chooses lower-ranked words randomly, adding an element of surprise and mimicking human-like unpredictability.
The role of randomness brings an exciting twist. With the same input text, ChatGPT could generate different continuations each time. A tool that measures the randomness level, known as the "temperature" parameter, guides this process.
The temperature controls the probability distribution during the sampling of the next tokens (words, subwords, punctuation marks). A lower temperature (e.g., 0.7) makes high-probability tokens even more likely, resulting in more deterministic output.
On the other hand, a higher temperature (e.g., 1.3) flattens the probability distribution, increasing the chances of sampling lower-probability tokens and generating more diverse text. A temperature of around 0.8 often creates appealing outputs, striking a balance between predictability and creativity.
Scaling from Letters to Words?
The underlying principle of ChatGPT can be explained through a simpler system that deals only with letters. Imagine generating English text one letter at a time, calculating the probability of each letter's occurrence based on a sample text. We could then make "words" by adding spaces at certain probabilities.?
However, creating meaningful text requires more than randomly handpicking each letter or word. It involves predicting a sequence of words based on the probabilities of their occurrence together.?
Overcoming the Limitations?
The key challenge is that deducing these probabilities from the vast amount of text on the internet or in digitized books is almost impossible. The solution? Building a model that can estimate the probabilities even for sequences we have never explicitly seen. That's the crux of ChatGPT's functionality.?
In Closing, Demystifying the Scale of ChatGPT Operations?
To gain a better grasp of the scale of computations involved in ChatGPT:
Consider this example.
Such is the magnitude of data processing performed by OpenAI's model within a fraction of a second.?
Further complicating the picture are the sophisticated neural networks that power the system, enveloping multiple layers of coding and algorithms, with the primary driving mechanism being the 'Transformer,' an innovative artificial neural network capable of simultaneous processing of multiple inputs. Thus, highlighting the astonishing scale and technological prowess of ChatGPT's operations is crucial in understanding its functionality.
Conclusion?
Taking advantage of the probabilities of sequences, injecting randomness, and carefully making one-word decisions at a time, ChatGPT continues to astound us by creating human-like compositions. I hope this simplified explainer provides a clearer picture of the enigmatic workings of ChatGPT.?