课程: Generative AI: Working with Large Language Models

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Megatron-Turing NLG Model

Megatron-Turing NLG Model

- [Instructor] A lot of the research after GPT-3 was released seemed to indicate that scaling up models improved performance. So Microsoft and Nvidia partnered together to create the Megatron-Turing NLG model, a massive three times more parameters than GPT-3. Modelwise, the architecture uses the transformers decoder just like GPT-3, but you can see that it has more layers and more attention heads than GPT-3. So for example, GPT-3 has 96 layers while as Megatron-Turing's NLG has 105. GPT-3 has 96 attention heads, and Megatron-Turing's NLG model has 128 and finally, Megatron-Turing's NLG model has 530 billion parameters versus GPT-3's 175 billion. Now, the researchers identified a couple of challenges with working with large language models. It's hard to train big models because they don't fit in the memory of one GPU because it would take a long time to do all the compute operations required. Efficient parallel…

内容