课程: Generative AI: Working with Large Language Models
今天就学习课程吧!
今天就开通帐号,24,700 门业界名师课程任您挑!
GLaM
- [Instructor] The Google research team noted that training large dense models requires significant amount of compute resources, and they proposed a family of language models called GLaM or Generalist Language Models. They use a sparsely activated mixture of experts architecture to scale and because they're using a sparse model, they have significantly less training costs compared to an equivalent dense model. Now these models use only 1/3 of the energy to train GPT-3 and still have better overall zero shot and one shot performance across the board. The largest GLaM model has 1.2 trillion parameters which is approximately seven times larger than GPT-3. Now the GLaM model architecture is made up of two components. The upper block is a transformer layer and so you can see the multi-head attention and the feed forward network. And in the bottom block you have the mixture of experts layer. Again, you have a multi-head…
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。
内容
-
-
-
-
-
GPT-34 分钟 32 秒
-
(已锁定)
GPT-3 use cases5 分钟 27 秒
-
(已锁定)
Challenges and shortcomings of GPT-34 分钟 17 秒
-
(已锁定)
GLaM3 分钟 6 秒
-
(已锁定)
Megatron-Turing NLG Model1 分钟 59 秒
-
(已锁定)
Gopher5 分钟 23 秒
-
(已锁定)
Scaling laws3 分钟 14 秒
-
(已锁定)
Chinchilla7 分钟 53 秒
-
(已锁定)
BIG-bench4 分钟 24 秒
-
(已锁定)
PaLM5 分钟 49 秒
-
(已锁定)
OPT and BLOOM2 分钟 51 秒
-
(已锁定)
GitHub models2 分钟 43 秒
-
(已锁定)
Accessing Large Language Models using an API6 分钟 25 秒
-
(已锁定)
Inference time vs. pre-training4 分钟 5 秒
-
-