课程: Introduction to Large Language Models
Transformer architecture
- [Narrator] Large language models are made up of a couple of components. We'll take a brief look at each of these in turn, starting off with the transformer architecture. The individual components of the transformer architecture are beyond the scope of this introductory course. Let's simplify this architecture by breaking it up into two components. So the left half of the diagram is known as the encoder and the right hand side is known as the decoder. We look at what tasks each perform in this video. So we can feed in the English sentence, such as, "I like dogs," into the encoder, at the bottom left of the diagram. The transformer can act as a translator from English to German. And so the output from the decoder at the top of the diagram is the German translation, (speaks in German). The transformer is not made up of a single encoder but rather six encoders and six decoders. Each of these contain the attention mechanism, which allows the model to focus on different parts of the input text. So this means you feed in the input text at the bottom and the output from the first encoder is fed into the second encoder and so on, all the way up to the top most encoder. Now by passing the data through these successive encoder layers, the model's able to capture deep and more complex understanding of language semantics. And this is then fed into the decoder layers, which are on the right hand side of the diagram. The encoder and decoder can be used independently or together, depending on the task. Encoder-decoder models are good for generative tasks, such as translation or summarization. Encoder only models are good for tasks that require good language understanding, such as sentence classification. Now, sentence classification is where you take some text, for example, a movie review and then you label it as being either positive or negative. BERT is an acronym for Bidirectional Encoder Representations from Transformers, which sounds like quite a mouthful but now that we know a little bit more about the transformer architecture and encoders, this acronym should make a bit more sense. Decoder-only models are good for generative tasks, such as text generation, like I've shown you in the OpenAI playground. So examples include ChatGPT and GPT-3 and GPT-4. Most of the research has been around decoder models as these can generate text, so this makes them more useful as chatbots or virtual assistants. All right, so transformers are made up of layers of encoders and decoders and depending on what kind of task we need to perform, we'll determine if we need either or both components.