课程: Generative AI: Working with Large Language Models
今天就学习课程吧!
今天就开通帐号,24,700 门业界名师课程任您挑!
Megatron-Turing NLG Model
- [Instructor] A lot of the research after GPT-3 was released seemed to indicate that scaling up models improved performance. So Microsoft and Nvidia partnered together to create the Megatron-Turing NLG model, a massive three times more parameters than GPT-3. Modelwise, the architecture uses the transformers decoder just like GPT-3, but you can see that it has more layers and more attention heads than GPT-3. So for example, GPT-3 has 96 layers while as Megatron-Turing's NLG has 105. GPT-3 has 96 attention heads, and Megatron-Turing's NLG model has 128 and finally, Megatron-Turing's NLG model has 530 billion parameters versus GPT-3's 175 billion. Now, the researchers identified a couple of challenges with working with large language models. It's hard to train big models because they don't fit in the memory of one GPU because it would take a long time to do all the compute operations required. Efficient parallel…
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。
内容
-
-
-
-
-
GPT-34 分钟 32 秒
-
(已锁定)
GPT-3 use cases5 分钟 27 秒
-
(已锁定)
Challenges and shortcomings of GPT-34 分钟 17 秒
-
(已锁定)
GLaM3 分钟 6 秒
-
(已锁定)
Megatron-Turing NLG Model1 分钟 59 秒
-
(已锁定)
Gopher5 分钟 23 秒
-
(已锁定)
Scaling laws3 分钟 14 秒
-
(已锁定)
Chinchilla7 分钟 53 秒
-
(已锁定)
BIG-bench4 分钟 24 秒
-
(已锁定)
PaLM5 分钟 49 秒
-
(已锁定)
OPT and BLOOM2 分钟 51 秒
-
(已锁定)
GitHub models2 分钟 43 秒
-
(已锁定)
Accessing Large Language Models using an API6 分钟 25 秒
-
(已锁定)
Inference time vs. pre-training4 分钟 5 秒
-
-