课程: Introduction to Large Language Models

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

GPT-3

GPT-3

- [Instructor] GPT-3 is probably one of the most well-known large language models. Let's take a look at what the letters GPT represent in turn. So G is for generative, as we are predicting a future token, given past tokens. P is for pre-trained, as it's trained on a large corpus of data, including English Wikipedia, amongst several others. This involves significant compute time and costs. And finally, the T corresponds to a transformer, and we're using the decoded portion of the transformer architecture. GPT-3's objective was simple. Given the preceding tokens in the example, it needs to predict the next token. So this is like predictive text on your phone. So if I gave it the phrase, "Once upon a," the most likely next token is time, "Once upon a time." Remember that a token is a sub-word. So these are known as causal or autoregressive language models. For a couple of years, the focus of researchers was getting a large…

内容