Neural Network Hidden Bottleneck, But
Max Ma, PhD
AI Architect (Model & ML Eng.) with depth and breadth for real world ML solution in healthcare and life science, strong science and engineering discipline with creative and curious mind
Max Y. Ma and Gen-Hua Shi
Three basic questions
Neural network is a powerful numerical computation architecture. To evaluate its capacity, it requires mathematical answers to 3 basic questions: degree of freedom, computation power, boundary condition?
The degree of freedom
The computation power
The boundary condition?
Hidden bottleneck
Evidence:?
Mitigations
A “double-edged sword” problem: loss curve evidence
Root Cause
?Fourier Features
For a deep manifold space with high-order nonlinearity, we should mathematically expect Fourier features (characteristics) within the space. The explicit manifestation of these Fourier features depends on the rigidity of the deep manifold space and the resilience of the neural network
As the strength of the bottleneck increases, so does the rigidity of the deep manifold space. At this point, Fourier features within the space should become apparent. Fourier analysis helps in understanding the frequency components that constitute the deep manifold space response. In general, the more rigid a system is, the more it tends to exhibit high-frequency components in its response. In this context, Fourier analysis can be an effective method for measuring learning capacity during training.
领英推荐
Robust and Resilient Neural Network
We are surprised by the prolonged slow decline, which is accompanied by a relatively high standard deviation (SD) in the loss values. This suggests that other factors may be at play.
The prolonged, slow decline in the loss curve suggests that the neural network is a robust and resilient system. We have concluded the following
The neural network operates as a power-efficient system, with each node requiring minimal computational power, even when the deep manifold space becomes rigid and a bottleneck develops. Additionally, all foundation model pre-training is self-supervised. The neural network's self-progressing boundary condition imposes no restrictions on where incoming data is processed. Incoming data will be directed to whichever nodes are capable of processing it. This means that the neural network continues to learn even during the slow decline stage. In this sense, grokking and double descent are evidence of this continued learning.?
It also means that the same token will be processed in different nodes. It is highly likely that many replicas of identical or near-identical feature bits (units of feature) disperse throughout the network. The inequality in mathematics, as described in the 'Contact Theory' (Shi, 2015), suggests that connections between nodes (pathways) are not equal. Our working theory proposes that feature bits propagate through the network, with their propagation distance determined by the computational capacity of each node. The pathway appears to be power-driven, prioritizing certain features or patterns during learning in a discriminatory manner. While this discriminative feature pathway (DFP) is mathematically plausible, the underlying theory remains unclear.
It prompts a fundamental question regarding how to measure the completeness of training. This could have numerous implications for pre-training and post-training, such as fine-tuning, in-context learning, model compression and merge.
Classical Manifold
Classical manifold is able to handle high dimensions and low order nonlinearity. The principle of these attempts is to transform or map the data manifold to a predefined manifold space, such as Mobius Strip and Klein bottle. However, it is all on a smooth surface, which is low order nonlinear
Numerical Manifold Method
Numerical manifold method (NMM) was developed by co-author Gen-Hua Shi, DoD sponsored in the early 1990s. The motivation for NMM was to develop a numerical method for linear, low order nonlinear and high order nonlinear all together.? Based on the NMM principle, Deep Manifold only applied 3 very basic topology concepts: cover, dual pairing and covering space.
Gen-Hua has nearly 50 years in the high order nonlinearity model. He was and still is a mathematician, Peking University(MS, 1968), the Institute of mathematics, Chinese Academy of Sciences (中科院数学所), UC Berkeley,? Lawrence Livermore National Laboratory, US DoD, Independent researcher & Consultant. He have solely developed?
Max studied under Gen-Hua for 10 years, 1989-1999. He suspected neural network high order nonlinearity in 2017. In his 2018, LinkedIn post . He said “I give myself 5-7 years for this” under the code name “Kahlua” at the end of his post
Reference
AI Architect (Model & ML Eng.) with depth and breadth for real world ML solution in healthcare and life science, strong science and engineering discipline with creative and curious mind
5 个月based on feedback, I updated my post...
Product & Technology
5 个月Is there a link or reference where we can read more about this?