What Is ChatGPT Doing … and Why Does It Work? – the summary
Chris Young
Chief Technology Officer and Chief Operating Officer, FinTech and Financial Regulations Expert, Agile Coach, Scrum Master, Program and Project manager and Game Designer ??
Unless you have been living under a rock, or perhaps in a mountain cave, you have no doubt been inundated recently with posts and articles on “ChatGPT”.
Many of these enthuse on the capabilities of the tool (and others like it) without really explaining what it is and how it works.
The remarkable Stephen Wolfram recently published what is, at least in my opinion, the best explanation I have read to date of what ChatGPT is doing and why it works:?https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
Several people in my network have commented that this article is quite detailed and lengthy, and they find it difficult to understand (or at least, challenging to find the time to read it thoroughly and understand ...).
So, at the risk of adding yet “one more” article on ChatGPT, and for those of you who are interested in the topic but don’t (yet) have the time to delve further, I present to you “What Is ChatGPT Doing … and Why Does It Work??– the summary”.
And apologies in advance to Stephen Wolfram for any misquoting or misinterpretations I might unintentionally introduce. I do recommend that, if at all possible, you read Stephen’s original article, which is far more eloquent and thorough in explaining the interior workings of the tool.
Chat what?
Firstly, what is ChatGPT??
Essentially it is a “chatbot” – i.e. software which can take in text input and respond with text input, in such a way that it mimics a person “chatting”.
The “GPT” part stands for “Generative Pre-trained Transformer” which is a fancy name for a?language model?which has been trained on a large volume of text data to generate human-like text.?
ChatGPT is developed by a company called OpenAI https://openai.com whose mission statement says “OpenAI is an AI research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of?humanity.”
What Is a Language Model?
A model, at least in computer science terms, is some kind of procedure for computing an answer to a complex problem, rather than having to measure and remembering each specific case.?A?language model?is a?probability distribution?over sequences of words.?Given any sequence of words, a language model assigns a probability?to the whole sequence.?
Any model you use has some particular underlying structure—then a certain set of “knobs you can turn” (i.e. parameters you can set) to fit your data. And in the case of ChatGPT, lots of such “knobs” are used—actually, 175 billion of them.
But the remarkable thing is that the underlying structure of ChatGPT is sufficient to make a model that computes next-word probabilities “well enough” to give us reasonable essay-length pieces of text.
One of the most fascinating (and surprising) conclusions that arises from this article is that Stephen identifies the possibility that the success of ChatGPT implicitly reveals an important “scientific” fact: that there’s actually a lot more structure and simplicity to meaningful human language than we ever knew—and that in the end there may be even fairly simple rules that describe how such language can be put together.
Which means?
In order to write this summary I have been experimenting with ChatGPT to understand its capabilities and also limitations.
Here is a simple example of an interaction with ChatGPT:
Adding one word at a time is all it takes ...
The idea behind ChatGPT is pretty simple. Start with a lot of text written by people from the Internet, books, etc. Then teach a neural network to make text "like this.", and in particular, make it able to start with a "prompt" and then continue with text that's "like what it's been trained with."?
The hardware running ChatGPT is made up of billions of very simple parts. For every new word or part of a word it makes, it passes input from the text it has already made "once through its elements" (without any loops or other complicated steps).?
But what's amazing and surprising is that this process can create text that is "like" what's on the web, in books, etc. Not only does it make sense, but it also "says things" that "follow its prompt" by using what it has "read." It doesn't always say things that "make sense globally" or match up with correct calculations, because it's just saying things that "sound right" based on how things "sounded" in its training material.
So let’s say we’ve got the text “What is ChatGPT?”. Imagine scanning billions of pages of human-written text (say on the web and in digitized books) and finding all instances of this text—then seeing what word comes next what fraction of the time.?
What ChatGPT effectively does is that it processes large amounts of text input and looks for things that in a certain sense “match in meaning”. But the end result is that it produces a ranked list of words that might follow, together with “probabilities”:
领英推荐
And the remarkable thing is that when ChatGPT does something like write an essay what it’s essentially doing is just asking over and over again “given the text so far, what should the next word be?”—and each time adding a word. (More precisely, it’s adding a “token”, which could be just a part of a word, which is why it can sometimes “make up new words”.)
At each step it gets a list of words with probabilities, but which one should it actually pick to add to the essay (or whatever) that it’s writing? One might think it should be the “highest-ranked” word (i.e. the one to which the highest “probability” was assigned). But this is where a bit of voodoo begins to creep in. Because for some reason if we always pick the highest-ranked word, we typically get a very “flat” essay, that never seems to “show any creativity” (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a “more interesting” essay.
The fact that there’s randomness here means that if we use the same prompt multiple times, we’re likely to get different essays each time. And, there’s a particular so-called “temperature” parameter that determines how often lower-ranked words will be used, and for essay generation, it turns out that a “temperature” of 0.8 seems best. (It’s worth emphasizing that there’s no “theory” being used here; it’s just a matter of what’s been found to work in practice.
Caveats and closing thoughts
ChatGPT and other similar tools are certainly interesting and potentially powerful tools.
Like any tool, they need to be used properly by someone with the skill, and possibly training, to be effective.
During my testing I received several responses from ChatGPT which, whilst reading like English language, were factually incorrect. I am sure the models will improve over time, but it is important to verify any output – after all, the old rule of “garbage in, garbage out” applies just as well to chatbots as it does humans ;)
OpenAI themselves warn that the input data is not entirely current and there is the possibility that erroneous information (or indeed “fake news”) may have been inadvertently included in the text data used to “train” the model.
Also since ChatGPT is using the text it was given to derive the probabilities for response, I do have some concerns about unintended plagiarism. Where did the original information come from and how does the author (or authors) get properly recognised or compensated?
Plus, ChatGPT is Beta software. This means that you can’t expect it to perform perfectly all the time (although to be fair, that’s true of most software). Several times while writing this article I received errors like this one:
These kinds of issues certainly aren’t limited to ChatGPT, and of course OpenAI is working to improve their product – as updted in this recent post from MIT Technology Review?https://www-technologyreview-com.cdn.ampproject.org/c/s/www.technologyreview.com/2023/02/21/1068893/how-openai-is-trying-to-make-chatgpt-safer-and-less-biased/amp/
It is exciting to see what ChatGPT has been able to do so far. It is certainly a great example of how a lot of simple computer parts can do amazing and unexpected things when they work together.?
I do hope you found this article useful and perhaps feel motivated to read more or do your own experimentation.
Sponsored by Imperial Twing
This article has been sponsored by?Imperial Twing,?Become the Vice Archmage of the Empire in this fast-paced strategic card and dice game suitable for families, kids from 8 years and adults of any age!
Coming soon to Kickstarter!
Follow us on Kickstarter:?
Signup for updates and special offers:?https://imperialtwing.com/register/
Chief Technology Officer and Chief Operating Officer, FinTech and Financial Regulations Expert, Agile Coach, Scrum Master, Program and Project manager and Game Designer ??
1 年With the increasing hype building up around this topic I’m glad this post seems to have resonated with a few people ??
Fostering innovations in the medical technologies sector
1 年Thank you Chris Young, for this succint and yet enlightning explanation. Like you, I have tested this AI system with questions that require nuanced expert understanding of a topic rather than "brute force" knowledge. Outcome: I'm impressed by the quality of ChatGPT's answer, 90% of the time.
Chief Technology Officer and Chief Operating Officer, FinTech and Financial Regulations Expert, Agile Coach, Scrum Master, Program and Project manager and Game Designer ??
1 年Cathy Paraskevas Mikko Tirronen Andres Pfister Nathalie Himmelrich Sarah Andris Reto R. Andris Olivier Orlow