课程: Introduction to Large Language Models

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Scaling laws

Scaling laws

- [Instructor] Imagine what things must have been like in the tech space in 2020. The transformer architecture in 2017 was proving to be better than anything before it. This is a time of significant experimentation. Some companies were focusing on the decoder portion, others on the encoder. Others were trying to figure out how they could make the models even better. And it was at this time that the research team at OpenAI suggested that the performance of large models was a function of the number of model parameters, the size of the dataset the models were trained on and the total amount of compute available for training. They performed several experiments on language models to back up their claim. Let's take a look at some of the results. So on the Y axes is the test loss and a lower test loss indicates that the model is performing better. Along the X axes is the number of parameters in the model. So you can see that the…

内容