登录查看更多内容

ChatGPT - How it really works

Vivek Kumar

Post Sales Leader- Irrationally passionate about customer flourishing. Helps SaaS firms thrive by reducing churn, increasing revenue, boosting adoption, & building lasting customer relationships

发布日期: 2023年9月21日

The GPT in ChatGPT stands for "Generative Pre-Trained Transformer" and is a language model that has gained widespread popularity for its ability to generate human-like text.

The Story of ChatGPT is not just the story of generative AI, it's also a story of the "random walk" of technology and the progress of science and philosophy. It is the product of the fortuituous convergence of the latest neural network technology and the availability of zettabytes of data on the internet, culminating in a burst of sudden progress. But how does it really work ? What's going on inside ChatGPT's mind?

ChatGPT is based on the concept of neural nets, originally invented in the 1940s as an idealization of the operation of the human brain. Birds inspired aeroplanes, burdock plants inspired velcro, its only natural that brain cells (neurons) be the inspiration for "intelligent machines".

Biological Neuron & Artificial Neural Network -Perceptron

What makes them so useful is that they can, in principle, do all sorts of tasks, and can be incrementally trained from test data to do those tasks. For example when we make a neural net distinguish "cats" from "dogs" we don't effectively have to write a program that explicitly says finds "whiskers" or "pointy ears"; instead, we just show it examples of cats & dogs, then have the network machine "learn" how to distinguish them. The trained network "generalizes" from the examples it is shown.

At its core, ChatGPT is just adding one word at a time. What it's doing is trying to produce a “reasonable continuation,” given the text so far.

Say we’ve got the text “The best thing about AI is its ability to". Now imagine scanning billions of pages of text on the web, digitized books, etc. & finding all instances of this text. Then seeing what word comes next what percentage of the time. In reality, ChatGPT doesn’t look at the literal text; it looks for things that in a certain sense “match in meaning”. But the end result is that it produces a "ranked list of words" that might follow, together with probabilities.

Reasonable Continuation of text in ChatGPT

Ranked List of "next" Words with Associated Probabilities

At each step, it gets a list of words with associated probabilities. The question is where do the probabilities come from? Let’s consider generating English text one letter (rather than one word) at a time. How can we work out what the probability for each letter should be? Take a sample of English text, and calculate how often different letters occur in it. For example, the image below displays letter counts based on articles on “cats" & dogs on Wikipedia

Letter Counts for "cats" & "dogs" Based on Wikipedia Articles

If we take a large enough sample of English text we can expect to eventually get at least fairly consistent results:

Probability of letters in English Language

Now, instead of a single letter, here’s a plot that shows the probabilities of pairs of letters, a "2-gram" in typical English text. The possible first letters are shown across the page, the second letters down the page. Human language is not just a random jumble of words. it has basic features-there are grammatical rules, there are syntactical rules, and there is an underlying structure to it. For example: a "q" is generally followed by a "u". In the "2-gram below we see that the “q” column is blank (zero probability) except on the “u” row.

Ultimately, we have to formulate everything in terms of numbers. One way to do this is to assign a unique number to each of the 40,000 or so common words in English. For example, “the” might be 914, & “cat” might be 3542. These are the actual numbers used by chatGPT. Here is what ChatGPT produces as the raw embedding vector for three specific words: cat, dog, and chair.

ChatGPT Embeddings for 3 words: cat, dog & chair

With a sufficiently large corpus of English text, we can get pretty good estimates not just for probabilities of single letters or pairs of letters (2-grams), but also for longer runs of letters. And if we generate “random words” with progressively longer n-gram probabilities, they get progressively “more realistic". If we were able to use sufficiently long n-grams we’d basically “get a ChatGPT”, in the sense that we’d get something that would generate essay-length sequences of words with the “correct overall essay probabilities”.

领英推荐

10 AI challenges that need to be addressed

ECI Partners 7 个月前

Geneea's AI Spotlight #8

Geneea 1 年前

The Paradox of AI Overestimation: An Homage to Human…

Starmind 1 年前

With approx. 40,000 or so common words in the English language, the number of "2-grams" is 1.6 billion and the number of possible "3-grams" is 60 trillion. By the "20-word" gram, the number of possibilities is greater than the total number of particles in the universe!!

That’s the problem with this approach- there just isn’t even close to enough english text that’s ever been written to be able to deduce those probabilities. Hence, we need to use Models.

These models have parameters (set of “knobs" you can turn) to fit the model to the data. In the case of ChatGPT, lots of such knobs are used, actually, 175 billion of them. By comparison, the human brain has about 100 billion neurons.

How were all those 175 billion weights in its neural net determined? Basically, they are the result of very large-scale training based on a huge corpus of text on the web, in books, etc. written by humans.

Now the question is: From this list, which one should it actually pick? Going with the “highest-ranked” word makes sense, but this is where a bit of "voodoo" begins to creep in. If we always pick the highest-ranked word, we’ll typically get a very “flat” essay, that never seems to “show any creativity” (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a “more interesting” essay. The fact that there’s randomness here means that if we use the same prompt multiple times, we’re likely to get different essays each time. There’s a particular "temperature” parameter" that determines how often lower-ranked words will be used, and for essay generation, it turns out that a “temperature” of 0.8 seems best. There’s no “theory” behind this, it’s what’s been found to work in the "real world".

Embeddings & Linguistic Feature Space:

Neural Nets are based on numbers. So if we want them to work for texts we need a way to represent our text with numbers. ChatGPT assigns a number to every word in the dictionary. There is a central idea in ChatGPT that goes beyond this. It's the idea of "embedding"-think of it as a way to represent the "essence" of something by an array of numbers, such that nearby things are represented by nearby numbers.

Inside chatGPT any piece of text is effectively represented by an array of numbers that we can think of as coordinates of a point in some kind of “Linguistic Feature Space”. When ChatGPT continues a piece of text this corresponds to tracing out a trajectory in a linguistic features space. Here is an example of how words corresponding to different parts of speech get laid out if we project such a feature space down to just 2D.

Parts of Speech laid out in 2D Linguistic Feature Space

Now let's look at the trajectory that a prompt from chatGPT follows in feature space-& then we can see how ChatGPT continues: "The best thing about AI is its ability to learn".

Trajectory ChatGPT follows in 2D Feature Space

What we see in this case is that there’s a “fan” of high-probability words that seem to go in more or less a definite direction in the feature space (bold black lines). Below is a 3D representation of what's going on for 40 steps.This seems like a mess & doesn't do anything to particularly encourage the idea that one can expect to identify “mathematical-physics-like” “semantic laws of motion” by empirically studying “what chatGPT is doing inside.

Trajectory ChatGPT follows in 3D Feature Space

As of now, we are not ready to “empirically decode” from its “internal behavior what chatGPT has “discovered" about how human language is “put together”.

Conclusion:

The basic concept of ChatGPT is at some level rather simple, start from a huge collection of human-created text, then train it on a neural net to generate texts, in particular, make it able to start from a “prompt” & then continue with text that's “like what it has been trained with”.

The actual neural net in ChatGPT is made up of very simple elements, billions of them. The basic operation of a neural net is also very simple, consisting essentially of passing input derived from the text it has generated so far “once through its elements”, without any loops, etc. But the remarkable & unexpected thing is that this process can produce text that is "human-like".

At some level, it's a great example of the fundamental scientific fact that large numbers of simple computational elements can do remarkable and unexpected things.

Ed Cain

1 年

Interesting read. The speed at which this program operates makes one wonder how much of today's articles etc start with GPT

Sanjeev Patil

1 年

Very interesting read, thak you fir sharing

Vivan Sinha

Student at Berkley

1 年

Very interesting & insightful, this is extremely informative.

Rajesh Chavan

Pharma and Life Sciences Innovator | 24+ yrs Optimising Business Processes, Outsourcing Strategies | Harnessing Generative AI and Data Science to Revolutionise Development, Commercialization, and Operational Efficiency

1 年

Thanks for taking the time to pen your thoughts on this. I do think more than using the technology. It’s important to understand how it is constructed.

Anu Singh

Director, Customer Success @ NetSuite | Leads Social Impact, NFP, Gov, Edu Industry for North America

1 年

Thank you, this was helpful!

查看更多评论

要查看或添加评论，请登录

Vivek Kumar的更多文章

AI Investment Payoff

2024年7月18日

AI Investment Payoff

From content creation to fine tuning marketing collateral to code development, AI initiatives are gripping companies…
THE COMING VALUE GRAB IN GENERATIVE AI MARKET

2024年4月3日

THE COMING VALUE GRAB IN GENERATIVE AI MARKET

AI has been hailed as the most transformative technology of our times, more profound than the invention of fire or…
Navigating the Growth Maze: Challenges & Lessons in Revenue Expansion

2023年12月25日

Navigating the Growth Maze: Challenges & Lessons in Revenue Expansion

A company's value stems from two fundamental variables: its ability to earn a healthy Return on Invested Capital (ROIC)…
Why Customer Success Needs Financial Smarts

2023年11月27日

Why Customer Success Needs Financial Smarts

"Value creation" has become a ubiquitous phrase that resonates in virtually every company meeting, yet few managers &…

5 条评论
Transform Customer Success to be the most capital efficient-engine of growth

2023年11月6日

Transform Customer Success to be the most capital efficient-engine of growth

In this difficult economic environment when organizations are struggling to maintain their topline, the ability to…

1 条评论
Are You Missing Out on Growth? Customer Success May Hold the Key

2023年10月23日

Are You Missing Out on Growth? Customer Success May Hold the Key

In the world of business, the question of why some companies consistently outperform their competitors has baffled the…

1 条评论
Survivorship Bias in your CS Data & its pitfalls

2023年9月1日

Survivorship Bias in your CS Data & its pitfalls

As organizations become more customer-centric they capture all kinds of data about their customers. From the first…

2 条评论
Collaboration: The new source of Competitive Advantage

2016年8月23日

Collaboration: The new source of Competitive Advantage

There is a well-known saying in management - “Organizations look very different from the top than what they look from…

4 条评论

See all articles

ChatGPT - How it really works

Vivek Kumar

Post Sales Leader- Irrationally passionate about customer flourishing. Helps SaaS firms thrive by reducing churn, increasing revenue, boosting adoption, & building lasting customer relationships

领英推荐

Embeddings & Linguistic Feature Space:

Conclusion:

Vivek Kumar的更多文章

社区洞察

其他会员也浏览了

From Checkers to ChatGPT: The Human History Behind Machine Learning

ChatGPT: The Good, The Bad and The Helpful

Case study — LLM: Overhyped and Underwhelming?

A few more thoughts on GenAI and Medicine

A path towards AGI: Can AI models introspect?

Unraveling the Mysteries: A Comprehensive Study of ChatGPT and Its Transformer Backbone - Part 2

Unlocking the Potential of AI: A Deep Dive Interview into Neural Networks and Real-Time Adaptation with ChatGPT

Book Review: The Age of AI And Our Human Future

Beyond ChatGPT: A Guided Tour of the Expansive Neural Networks Shaping Our Future!

A 6th Grade Level Explanation of the Evolution of Generative AI

领英推荐

Embeddings & Linguistic Feature Space:

Conclusion:

Vivek Kumar的更多文章

AI Investment Payoff

THE COMING VALUE GRAB IN GENERATIVE AI MARKET

Navigating the Growth Maze: Challenges & Lessons in Revenue Expansion

Why Customer Success Needs Financial Smarts

Transform Customer Success to be the most capital efficient-engine of growth

Are You Missing Out on Growth? Customer Success May Hold the Key

Survivorship Bias in your CS Data & its pitfalls

Collaboration: The new source of Competitive Advantage

社区洞察

其他会员也浏览了

From Checkers to ChatGPT: The Human History Behind Machine Learning

ChatGPT: The Good, The Bad and The Helpful

Case study — LLM: Overhyped and Underwhelming?

A few more thoughts on GenAI and Medicine

A path towards AGI: Can AI models introspect?

Unraveling the Mysteries: A Comprehensive Study of ChatGPT and Its Transformer Backbone - Part 2

Unlocking the Potential of AI: A Deep Dive Interview into Neural Networks and Real-Time Adaptation with ChatGPT

Book Review: The Age of AI And Our Human Future

Beyond ChatGPT: A Guided Tour of the Expansive Neural Networks Shaping Our Future!

A 6th Grade Level Explanation of the Evolution of Generative AI