Chinese Large AI models are much larger than ChatGPT and Google's Models

Chinese Large AI models are much larger than ChatGPT and Google's Models

Number of Parameters are not everything. Every time a huge model comes, model compression and/or enhanced architectures usually follow that are able to achieve similar performance with much fewer parameters. But, an argument can be made that there is a certain correlation between number of parameters and how intelligent we perceive a model to be. I say perceive here.

All the rage for the last couple of months in Conversational AI has been ChatGPT. This 175 Billion Parameter model from OpenAI has single-handedly reinvigorated the hype of conversational AI. Even though it fails on close scrutiny, quite a number of use-cases that was previously thought (by laypeople) to be far off is suddenly closer, if not already here.

As we discuss this hype though, it is important to stress that there are a lot of models out there that is larger than ChatGPT in number of parameters. These models, that don't necessarily have an interface for you to play around with them, may be just as capable. Looking at number of parameters, some might even be vastly more capable.

What's more concerning is that the largest models are not even well known inside the AI and Conversational AI community. This is because they are Chinese. Tsunghua University, Bejing Academy of Artificial Intelligence, Zhejing Lab, Alibaba, Tencent, Baidu... is just a small subset of very capable AI research hubs that are researching and creating large AI models.

Google's largest model is their Switch Transformer with 1.6 Trillion parameters. They have others like GLaM (1.2 T), Minerva (540 B), PaLM and derivatives (540 B), BERT-480 (480 B), Gopher (280 B) and BERT-200 (200 B). All larger than ChatGPT's 175 Billion parameters.

Then you have the Chinese models. Wu-Dao 2.0 clocks in with 1.75 Trillion parameters. That is 10 times as many as ChatGPT. Now, Wu-Dao 2.0 is multimodal, so it can take image input as well as textual input, so that needs some more parameters for sure. Also, it uses a Mixture of Experts approach similar to the Google models, that can be argued to need more parameters than the more "pure" Transformer architecture of ChatGPT. But still, it's 10x the number of parameters.

However, that's nothing compared to the behemoth model called BaGuaLu, which is only referenced in a paper describing how to train it. BaGuaLu is claimed to have... hold on to your seats... 175 Trillion parameters. That's Trillion with a T. Which is 1000 times the size of ChatGPT. It uses a Mixture of Experts architecture as well, with 90.000 experts!

So, when you see people touting the number of parameters of ChatGPT, keep in mind that parameters don't tell the whole story, and that there is 20+ models out there already with more parameters. Some models even exists with more parameters than the yet to be published GPT-4.

There are many LLM throwing up their hand. Theoretically, the question boils down to compute. How much money are you willing to spend on computing to train how million / trillion parameters? In a few years - The arguable question is if train any LLM on "enough data sets," all of them could theoretically zero in on a similar answer, especially if the content is open-source internet content since there is no differentiation in the data set corpus.

要查看或添加评论,请登录

Magnus Revang的更多文章

社区洞察

其他会员也浏览了