A 6th Grade Level Explanation of the Evolution of Generative AI
Wei Wen Chen
I write about data management, analytics, artificial intelligence and machine learning. Please connect with me and we will learn and grow together.
In 2017 a famous paper called "Attention is All You Need" written by some really smart people at Google really changed the dynamics of AI, leading to the incredible innovations in Generative AI that we see today. I'm writing this because I have twin boys just graduated out of 6th grade, and although they are smarter than their dad, I wanted to also test out ChatGPT 4 to see how well it could simplify concepts such as it's own creation! So disclaimer to those of you in my network that are way beyond the basics, and might find this analogy a bit insulting or inaccurate. I let ChatGPT have at it, and tweaked it by recommending the dog and ball analogy and post editing for a bit of pun and humor, with 5 rounds of editing and re-prompting. But for the most part, you could say that this is a bit of an Ancestry.com post by ChatGPT.
Why Was This Paper Written?
Imagine you're playing fetch with your dog. You throw a ball, and your dog runs to get it. Now, imagine if your dog had to run a little bit, stop and look back at you, then run a bit more, stop and look back again, and keep doing this until it gets the ball. That would be slow and inefficient, right? This is similar to the problem that the authors of the paper were trying to solve.
Before this paper was written, computer programs that translated languages or generated text (like writing a story) had a problem. They worked a bit like our single attention span dog. These programs, which used something called Recurrent Neural Networks (RNNs), could process information in a sequence, like words in a sentence, one after the other. But they had trouble tying back the information from earlier in the sequence as they processed more and more words. This is a problem because, in language, what you say at the end of a sentence often depends on what you said at the beginning.
Introducing the Transformer
When you throw the ball, your dog pays "attention" to where the ball is going, how fast it's moving, and what direction it's in. It uses all this information to decide the best way to fetch the ball. The concepts in the Google paper introduces a Transformer that does something similar with words. It pays "attention" to all the words in a sentence at once and uses this information to understand the sentence better.
"The dog that fetched the ball is very smart."
For example, consider the sentence, "The dog that fetched the ball is very smart." If a program forgets about "the dog" by the time it gets to "is very smart," it won't know who is smart. This is a big problem for understanding and generating language.
The "attention mechanism." allows the computer to pay attention to all parts of the sentence, no matter how long it is, just like how you can remember all the times you've thrown the ball for your dog, not just the last two throws.
Now that I have Your Attention, there's More
Let's go back to a variation of the example of the sentence "He threw the ball to his dog". The Transformer model has two main parts: the encoder and the decoder. The encoder is like your hand that throws the ball. It takes in all the words of a sentence and understands what each word means in the context of the sentence. The decoder is like your dog that fetches the ball. It uses the information from the encoder to translate the sentence into another language. The encoder pays attention to each word in the sentence and how it relates to every other word. This is known as "self-attention".
For example, the encoder understands that "he" is related to "his", and "ball" is related to "threw". It also understands the order of the words, thanks to "position encoding". So, it knows that "He threw the ball" comes before "to his dog".
Once the encoder has read the sentence, it passes this information to the decoder. The decoder then uses this information to understand, and then provide the correct context to determine what the resulting generated sentence or response should be. It also uses self-attention and position encoding, just like the encoder.
Parallel Processing and why GPUs are like Gold Right Now
One of the unique things about the Transformer is that it can process all the words in the sentence at the same time aka "parallel processing".
Think of it like this: instead of reading the sentence word by word, like "He... threw... the... ball... to... his... dog...", the Transformer reads the entire sentence at once. This makes it much faster and more efficient.
However, parallel processing requires a lot of computing power. That's where GPUs (Graphics Processing Units) come in. GPUs, like those made by 英伟达 , are very good at handling many tasks at once. This makes them perfect for running Transformer models. This explains why Nvidia stock has skyrocketed recently. That and the fact that they are used for gaming and Bitcoin mining, means the stock is a triple threat , which explains its sky high valuation.
领英推荐
Because of parallel processing and GPUs, tools like ChatGPT can generate responses in real time. This is important for applications like chatbots, where you want the AI to respond quickly.
Other Doggone Concepts
The Transformer model also uses a technique called "multi-head attention". This is like having multiple dogs playing fetch at the same time, each one paying attention to a different ball. This allows the Transformer to focus on different parts of the sentence at the same time, which helps it understand the sentence better.
Another important concept is "scaling the model dimensions". This is like teaching your dog to fetch more balls at once. By doing this, the Transformer can process more information at the same time, which makes it more efficient.
Finally, the Transformer uses a "fixed window approach". This is like limiting how far your dog can run to fetch the ball. By focusing on a smaller part of the sentence, the Transformer can process the sentence faster.
Why did it take 5 years After This Paper for ChatGPT to be Released?
The "Attention is All You Need" paper was a game-changer. It led to the development of many advanced AI models, like BERT and GPT-4, which are used in Google Search and ChatGPT. However, it took almost five more years for models like ChatGPT to be unveiled. In between, there were many advancements in AI, like improvements in the Transformer model and better ways to train these models. Building something like ChatGPT isn't a walk in the park. It's like training a super-smart dog that can understand and generate human language - it takes a lot of time and effort. Plus, each version of GPT (the tech behind ChatGPT) got bigger and better, so that added to the wait. OpenAI also had to think about safety - they didn't want their AI to be misused. And let's not forget about setting up all the tech stuff to handle millions of users, dealing with legal stuff, and making sure the AI was really ready for showtime. So, while the paper was a big step, there was still a lot of work to do before ChatGPT could hit the stage. But that's a discussion for another article.
So what are Large Language Models (LLMs) and how can you and your company benefit?
Large Language Models (LLMs) are the super-smart dogs that can fetch not just one, but hundreds of balls at once! They can understand and generate human-like text, making them useful for things like writing articles, answering questions, and even assisting in or actually coding.
Imagine if your dog could fetch the ball, bring you your slippers, do your homework, and cook dinner all at once. That's what businesses can do with LLMs. They can use them to automate tasks, understand customer behavior, and even create content. And the best part? Businesses can train their own LLMs to do exactly what they need with Google, Microsoft, Amazon, Databricks and others all encouraging enterprises to build on their foundation LLMs.
As individuals we are seeing the benefits of using tools like ChatGPT, Google Bard, Anthropic Claude. As well as specialized tools for marketing and sales like Jasper AI, Notion and many more.
Giving Credit where Credit is Due
The "Attention is All You Need" paper introduced a new way for computers to understand and generate language. It has led to the creation of powerful AI models like BERT and GPT-4. And thanks to parallel processing and GPUs, these models can generate human-like text in real time.
WDYT? Did ChatGPT do a good job explaining its Ancestry?
Appendix
For a more detailed and technical explanation of the "Attention is all you need" paper, check out these resources that I fed ChatGPT as sources using the Link Reader plugin for ChatGPT 4 plus for this post.