Overparameterization: my debate with GPT4
Max Ma, PhD
AI Architect (Model & ML Eng.) with depth and breadth for real world ML solution in healthcare and life science, strong science and engineering discipline with creative and curious mind
Overparameterization
Two recent readings caught my attention
- Foundation Models are Entering their Data-Centric Era (Stanford) says “Our views around overparameterization were misleading us and recent insights opened up a new direction”
- A Universal Law of Robustness via Isoperimetry (Microsoft and Stanford) says “Solving n equations generically requires only n unknowns, However, the revolutionary deep learning methodology revolves around highly overparametrized models”
I don't believe deep learning model is overparametrized because
- If overparameterization is true, half of math theory would collapse
- Deep learning model is high dimension, high nonlinear system, similar to my PhD (30 years ago) work on high nonlinear system, we did not have such issues.
Discussion about overparameterization is very relevant in GPT4 era: overparameterization, overfitting and GPT4 like system's generalizability. I stared forming my theory, and discussed with few my theoretical friends including one who is legendary figure mathematician in manifold theory. With their initial confirmation and support, I decided to write a short paper to detail my theory. To do that, I need do literature review, I turned to GPT4 for help, somehow my conversation GPT4 became debate like discussion. I share my entire lengthy debate with GPT4 below.
The debate with GPT4
It is lengthy conversation/debate, I highlight few points first.
- GPT4 can't give solid definition of overparameterization, which troubles me some, but not GPT4 fault.
- My arguments, ("you" here is GPT4)
- Do you know what is right size of parameter for given deep learning network, before you call it overparameterized ?
- why overparameterization leads to overfitting, any math reasoning ?
- how does model or researcher know "the model has more parameters than needed to represent the complexity of the underlying data or problem" ?
- in summary, your point is that it is very hard to estimate right size of parameters, at same time, you call large model with high number of parameters as overparameterized ?
- how do you determine "model capacity", in fact, model capacity is determined by number of parameters by large degree
- "The understanding of overparameterization and overfitting is still evolving, and future research may further refine our knowledge of these concepts." is reasonable.
- Your assumption is that more parameters will learn from noise, therefore it is overfitting. how do you know that is the case, more parameters will learn from noise, not filter out noise ?
- "It is not a strict rule that more parameters will always learn noise and lead to overfitting" if so, overparameterization leads to overfitting is very weak argument, isn't ?
Overall I had good experience, it probably saved me about 30-50 hours on literature review of total160-200 hours I estimated.
we closed our debate like this
please keep mind that I did not plan to debate with GPT4 and I hoped GPT4 can help me on literature review.