AI Lexicon: How ChatGPT Works
https://openai.com/blog/chatgpt/

AI Lexicon: How ChatGPT Works

You are reading "Software, My Dream Job," a software engineering newsletter covering one person's perspective from working in the software industry. If you'd like to receive notifications for new editions, click or tap "Subscribe" above.


ChatGPT has quickly gained millions of people's attention, but many are wary because they don't get how it works. This is understandable, and this month's edition is about trying to break it down, so it's easier to understand. However, at its core, ChatGPT is a massively complex system, so there's only so much simplification I can help with!?

If you still need to play with ChatGPT or don't know what it is, the core interface is a chat window where you can ask questions or provide queries, to which the AI responds. An important detail to remember is that within a chat, context is retained, meaning that messages can refer to previous messages, and ChatGPT will be able to understand that contextually.?

What happens when you hit enter on that query in the chat box?


Neural Networks

Let's first start with a step back. There is a lot to uncover under ChatGPT's hood. Machine learning has been rapidly advancing in the past 10 years, and ChatGPT leverages a number of the state of the art techniques to achieve its results.?

Neural Network example, from https://en.wikipedia.org/wiki/Neural_network
Neural Network, from https://en.wikipedia.org/wiki/Neural_network

Neural networks are layers of interconnected "neurons," each responsible for receiving input, processing it, and passing it along to the next neuron in the network. Neural networks form the backbone of AI today. The input is typically a set of numerical values called "features," representing some aspect of the data being processed. For example, in the case of language processing, the features might be the word embeddings that represent the meaning of each word in a sentence.

Word embeddings are just a numerical representation of text that the neural network will use to understand the semantic meaning of the text, which can then be used for other purposes, like responding in a semantically logical way!

So upon hitting enter in ChatGPT, that text is first converted into word embeddings, which were trained on text from throughout the Internet. There is then a neural network that has been trained to, given the input word embeddings, output a set of word embeddings that are an appropriate response. These embeddings are then translated into human-readable words using the inverse operation to the one applied to the input query. That decoded output is what ChatGPT prints back out.


ChatGPT Model Size

The conversions, and output generation, are pretty computationally costly. ChatGPT sits on top of GPT-3, a large language model with 175 billion parameters. This means there are 175 billion weights in an extensive neural network that OpenAI has tuned with their large datasets.?

So each query requires 175 billion computations at least a couple of times, which adds up quickly. It's possible OpenAI has figured out a way to cache some of these computations to reduce compute costs, but I'm unaware of that information being published anywhere. Furthermore, GPT-4, which is supposed to be released early this year, supposedly has 1000x more parameters!?

Computational complexity leads to real costs here! It will not surprise me if ChatGPT becomes a paid product soon, as OpenAI is currently spending millions of dollars to operate it for free.?


Encoders, Decoders, and RNNs

To understand what's happening in the neural network I mentioned above, let's go back into the research.?

One neural network architecture commonly used in natural language processing is the encoder-decoder network. These networks are designed to "encode" an input sequence into a compact representation and then "decode" that representation into an output sequence.

Traditionally, encoder-decoder networks have been paired with recurrent neural networks (RNNs) designed to process sequential data. The encoder processes the input sequence and produces a fixed-length vector representation, which is then passed to the decoder. The decoder processes this vector and produces the output sequence.

Encoder-decoder networks have been widely used in tasks such as machine translation, where the input is a sentence in one language, and the output is the translation of that sentence into another. They have also been applied to summarization and image caption generation tasks.

Encoder decoder architecture, from https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Encoder/Decoder Architecture, from https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8


Transformers and Attention

Similar to the encoder-decoder architecture, the transformer includes both components; however, the transformer differs in that it uses a self-attention mechanism to allow each element of the input to attend to all other elements, allowing it to capture relationships between elements regardless of their distance from each other.

The transformer also uses multi-headed attention, allowing it to attend to multiple parts of the input simultaneously. This allows it to capture complex relationships in the input text and produce highly accurate results.

Transformers replaced the encoder-decoder architecture as the natural language processing state-of-the-art model when the "Attention is All You Need" paper was published in 2017 because it allowed for much better performance over longer pieces of text.

Transformer architecture, from research paper "Attention is all you need"?
Transformer Architecture, from https://arxiv.org/pdf/1706.03762.pdf


Generative Pre-Training

Generative pre-training is a technique that has been particularly successful in the field of natural language processing. It involves training an extensive neural network on a massive dataset in an unsupervised manner to learn a general-purpose representation of the data. This pre-trained network can be fine-tuned on a specific task, such as language translation or question answering, leading to improved performance.?

Generative Pre-Training Architecture, from "Improving Language Understanding by Generative Pre-Training"?
Generative Pre-Training Architecture, from "Improving Language Understanding by Generative Pre-Training"


In the case of ChatGPT, this means fine-tuning the last layer of the GPT-3 model towards the use case of answering questions in chat, which also leverages human labeling. A more detailed look at the fine tuning that ChatGPT leverages can be seen in the image below:

Steps to fine tune ChatGPT
Fine Tuning Steps of ChatGPT, from https://arxiv.org/pdf/2203.02155.pdf


Bringing it all together

So there are many moving parts under the hood of ChatGPT, which will only grow. It will be very interesting to see how this continues to evolve, as advancements in many different areas will help GPT-like models gain further adoption.

Over the next year or two, we will likely see significant disruption from this new enabling technology, and I'm excited to get to watch! Get your popcorn ready...


Reference research papers:

Neural Machine Translation By Jointly Learning To Align And Translate, 2015

Attention Is All You Need, 2017

Training language models to follow instructions with human feedback, 2022

Improving Language Understanding by Generative Pre-Training, Preprint

要查看或添加评论,请登录

Benjamin Hendricks的更多文章

  • The Vision Pro

    The Vision Pro

    Unpopular opinion: the Vision Pro isn't all that great yet. Strike One I picked mine up last Friday and will be…

    7 条评论
  • Building Sentitrac: Rotating Carousel Component

    Building Sentitrac: Rotating Carousel Component

    You are reading "Software, My Dream Job," a software engineering newsletter covering one person's perspective from…

  • SentiTrac

    SentiTrac

    Last edition talked about the Sentitrac Beta Launch, and this edition will continue the discussion of Sentitrac, but…

  • Sentitrac Beta Launch

    Sentitrac Beta Launch

    You are reading "Software, My Dream Job," a software engineering newsletter covering one person's perspective from…

    1 条评论
  • ndricks x NBA

    ndricks x NBA

    You are reading "Software, My Dream Job," a software engineering newsletter covering one person's perspective from…

    1 条评论
  • Mobile Ecosystem Survey

    Mobile Ecosystem Survey

    You are reading "Software, My Dream Job," a software engineering newsletter covering one person's perspective from…

    2 条评论
  • Communication Best Practices

    Communication Best Practices

    You are reading "Software, My Dream Job," a software engineering newsletter covering one person's perspective from…

  • Athlete's Mindset

    Athlete's Mindset

    You are reading "Software, My Dream Job," a software engineering newsletter covering one person's perspective from…

    1 条评论
  • Releasing A New App

    Releasing A New App

    You are reading "Software, My Dream Job," a software engineering newsletter covering one person's perspective from…

    1 条评论
  • Compensation

    Compensation

    The main thing to know about software engineering compensation is that it's often not based on salary only. That said…

社区洞察

其他会员也浏览了