Why are even 1 trillion parameters not enough?
The need for models with fewer parameters arises when the largest models, which capture more nuances and patterns with higher accuracy, are constrained by computational complexity. The Compound Annual Growth Rate (CAGR) of parameters in neural network (NN) models from 1998, starting with LeNet-5 up to ChatGPT-3, is approximately 97% per year. This growth, as evident from the logarithmic scale in the figure, follows a linear trend. A 97% CAGR means that, on average, the number of parameters in these NN models nearly doubled every year between 1998 and 2020. By 2040, the projected number of parameters in an NN model, based on the current growth rate, would be approximately 132e15 or 132 quadrillion. The projected number of parameters in NN models by the year 2040 (132 quadrillion) is about 1,320 times greater than the estimated 100 trillion synaptic connections in the human brain. This comparison highlights the immense scale at which NN models are growing, potentially surpassing the complexity of connections found in the human brain by a significant margin. Although 132 quadrillion parameters is considered a large number, it is approximately 3,500 times larger than the number of cells in the human body, which is 37.2 trillion. Hopefully, by 2040, we will have NNs capable of extracting features representing regularities among human body cells. Training such immense models in the future will require a profound understanding of these models, which is crucial for scaling up to networks that are sufficiently large. The most straightforward way to simplify this would be the sparsification of these models, resulting in fewer parameters. In other words, the challenge is to find a sparse model within the dense structure. However, it remains to be seen whether such a sparse network exists within a dense structure.