课程: Introduction to Large Language Models

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Chinchilla

Chinchilla

- [Instructor] Over the years, the trend has been to increase the model size. Although we won't look at any of these models in detail. I'll mention them briefly now because we'll be comparing them later. So Megatron-Turing was released by a collaboration between Microsoft and Nvidia in Jan of 2022 that had 530 billion parameters. The Google DeepMind team released details about Gopher, which had 280 billion parameters, and it was one of the best models out there at the time. You can see that the model sizes were getting very large, and this was because of the scaling laws. But what if the scaling laws didn't capture the entire picture? The DeepMind team's hypothesis was that large language models were significantly undertrained. You could get much better performance with the same computational budget by training a smaller model for longer. Now, the way you would try and test out a hypothesis is to do a whole lot of…

内容