Why are even 1 trillion parameters not enough?

Why are even 1 trillion parameters not enough?

The need for models with fewer parameters arises when the largest models, which capture more nuances and patterns with higher accuracy, are constrained by computational complexity. The Compound Annual Growth Rate (CAGR) of parameters in neural network (NN) models from 1998, starting with LeNet-5 up to ChatGPT-3, is approximately 97% per year. This growth, as evident from the logarithmic scale in the figure, follows a linear trend. A 97% CAGR means that, on average, the number of parameters in these NN models nearly doubled every year between 1998 and 2020. By 2040, the projected number of parameters in an NN model, based on the current growth rate, would be approximately 132e15 or 132 quadrillion. The projected number of parameters in NN models by the year 2040 (132 quadrillion) is about 1,320 times greater than the estimated 100 trillion synaptic connections in the human brain. This comparison highlights the immense scale at which NN models are growing, potentially surpassing the complexity of connections found in the human brain by a significant margin. Although 132 quadrillion parameters is considered a large number, it is approximately 3,500 times larger than the number of cells in the human body, which is 37.2 trillion. Hopefully, by 2040, we will have NNs capable of extracting features representing regularities among human body cells. Training such immense models in the future will require a profound understanding of these models, which is crucial for scaling up to networks that are sufficiently large. The most straightforward way to simplify this would be the sparsification of these models, resulting in fewer parameters. In other words, the challenge is to find a sparse model within the dense structure. However, it remains to be seen whether such a sparse network exists within a dense structure.


要查看或添加评论,请登录

Saeed Damadi的更多文章

  • What happens to websites!

    What happens to websites!

    According to Chris Dixon concern (shared by Andreessen Horowitz ), there's a fundamental shift happening in how the…

  • The NVIDIA Paradox

    The NVIDIA Paradox

    Despite NVIDIA's soaring stock price and market dominance in AI computing, a simple mathematical analysis raises a…

    5 条评论
  • Timeless Lessons from Market Wizards

    Timeless Lessons from Market Wizards

    I read Market Wizards and found it fascinating. It introduced me to the rules and beliefs that these trading legends…

  • Examples of Being Relentlessly Resourceful

    Examples of Being Relentlessly Resourceful

    Paul Graham, the co-founder of Y Combinator, in his insightful essay, introduced the concept of being "relentlessly…

  • The Mathematical Limits of AI Scaling: When Will We Hit the Wall?

    The Mathematical Limits of AI Scaling: When Will We Hit the Wall?

    In my previous post, I discussed how neural network parameters have grown at nearly double every year since 1998 - far…