登录查看更多内容

Neural Network Hidden Bottleneck, But

Max Ma, PhD

AI Architect (Model & ML Eng.) with depth and breadth for real world ML solution in healthcare and life science, strong science and engineering discipline with creative and curious mind

发布日期: 2024年6月14日

+ 关注

Max Y. Ma and Gen-Hua Shi

Three basic questions

Neural network is a powerful numerical computation architecture. To evaluate its capacity, it requires mathematical answers to 3 basic questions: degree of freedom, computation power, boundary condition?

The degree of freedom

This will answer how many dimensions the architecture can compute.??
Our finding: unbounded dynamism and virtually limitless degree of freedom

The computation power

This will answer how complex a problem the architecture is able to solve.??
Our finding: computation power scales exponentially with depth

The boundary condition?

This will answer how fixable the architecture is able to handle the external constraints
Our finding: fluidity in self-progress boundary conditions for robust, high-dimensional redundancy

Hidden bottleneck

Evidence:?

Zig-zag up-down loss curve at slow decline stage, grokking and double descent.

Mitigations

Dropout, shortcut/skip connections and data work including labeling function?
Transformer has the best mitigation strategy, not its attention per se.

A “double-edged sword” problem: loss curve evidence

Initial rapid drop: neural network is very capable even for high order nonlinear problems, like any foundation models.?
Struggling slow decline: neural network is struggling with high order nonlinearity.?

Root Cause

Neural networks do not handle high order nonlinearity well.

?Fourier Features

For a deep manifold space with high-order nonlinearity, we should mathematically expect Fourier features (characteristics) within the space. The explicit manifestation of these Fourier features depends on the rigidity of the deep manifold space and the resilience of the neural network

As the strength of the bottleneck increases, so does the rigidity of the deep manifold space. At this point, Fourier features within the space should become apparent. Fourier analysis helps in understanding the frequency components that constitute the deep manifold space response. In general, the more rigid a system is, the more it tends to exhibit high-frequency components in its response. In this context, Fourier analysis can be an effective method for measuring learning capacity during training.

Bernard Marr 5 年前

Can I simulate a financial time series process with a…

Lars Warren Ericson 2 个月前

7 Applications of Convolutional Neural Networks

Flatworld Solutions 2 年前

Robust and Resilient Neural Network

We are surprised by the prolonged slow decline, which is accompanied by a relatively high standard deviation (SD) in the loss values. This suggests that other factors may be at play.

The prolonged, slow decline in the loss curve suggests that the neural network is a robust and resilient system. We have concluded the following

Dynamic computation with infinite degree of freedom.
The fluidity in self-progressing boundary conditions.

The neural network operates as a power-efficient system, with each node requiring minimal computational power, even when the deep manifold space becomes rigid and a bottleneck develops. Additionally, all foundation model pre-training is self-supervised. The neural network's self-progressing boundary condition imposes no restrictions on where incoming data is processed. Incoming data will be directed to whichever nodes are capable of processing it. This means that the neural network continues to learn even during the slow decline stage. In this sense, grokking and double descent are evidence of this continued learning.?

It also means that the same token will be processed in different nodes. It is highly likely that many replicas of identical or near-identical feature bits (units of feature) disperse throughout the network. The inequality in mathematics, as described in the 'Contact Theory' (Shi, 2015), suggests that connections between nodes (pathways) are not equal. Our working theory proposes that feature bits propagate through the network, with their propagation distance determined by the computational capacity of each node. The pathway appears to be power-driven, prioritizing certain features or patterns during learning in a discriminatory manner. While this discriminative feature pathway (DFP) is mathematically plausible, the underlying theory remains unclear.

It prompts a fundamental question regarding how to measure the completeness of training. This could have numerous implications for pre-training and post-training, such as fine-tuning, in-context learning, model compression and merge.

Feature Bits & DFP (left) and Bifurcation theory illustration(right)

Classical Manifold

Classical manifold is able to handle high dimensions and low order nonlinearity. The principle of these attempts is to transform or map the data manifold to a predefined manifold space, such as Mobius Strip and Klein bottle. However, it is all on a smooth surface, which is low order nonlinear

Numerical Manifold Method

Numerical manifold method (NMM) was developed by co-author Gen-Hua Shi, DoD sponsored in the early 1990s. The motivation for NMM was to develop a numerical method for linear, low order nonlinear and high order nonlinear all together.? Based on the NMM principle, Deep Manifold only applied 3 very basic topology concepts: cover, dual pairing and covering space.

Gen-Hua has nearly 50 years in the high order nonlinearity model. He was and still is a mathematician, Peking University(MS, 1968), the Institute of mathematics, Chinese Academy of Sciences (中科院数学所), UC Berkeley,? Lawrence Livermore National Laboratory, US DoD, Independent researcher & Consultant. He have solely developed?

“KeyBlock Theory”, 1970s
“Discontinuous Deformation Analysis” (DDA), 1980s
“Numerical Manifold Method” (NMM), 1990s
“Contact Theory” (EAB), 2000s

Max studied under Gen-Hua for 10 years, 1989-1999. He suspected neural network high order nonlinearity in 2017. In his 2018, LinkedIn post . He said “I give myself 5-7 years for this” under the code name “Kahlua” at the end of his post

Reference

Shi, G., Manifold Method of Material Analysis.? Proceedings of the 9th Army Conference on Applied Mathematics and Computing, 1991
Max Y. Ma. Single Field Manifold Method using Fourier Function in Wave Propagation Analysis. Working Forum on the Manifold Method of Material Analysis, Volume I, U.S. Army Corps of Engineers, 1995
Max Y. Ma, Musharraf Zaman and J. H. Zhu Discontinuous Deformation Analysis Using the Third Order Displacement Function, Proceedings of the First International Forum on Discontinuous Deformation Analysis (DDA) and Simulations of Discontinuous Media, pages 383-394, 1996?
Shi, G.,? Contact Theory, Science China Technological Sciences, Volume 58, pages 1450–1496, 2015

Max Ma, PhD

AI Architect (Model & ML Eng.) with depth and breadth for real world ML solution in healthcare and life science, strong science and engineering discipline with creative and curious mind

5 个月

based on feedback, I updated my post...

Arnaud Rachez

Product & Technology

5 个月

Is there a link or reference where we can read more about this?

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Neural Network Hidden Bottleneck, But

Max Ma, PhD

AI Architect (Model & ML Eng.) with depth and breadth for real world ML solution in healthcare and life science, strong science and engineering discipline with creative and curious mind

Three basic questions

The degree of freedom

The computation power

The boundary condition?

Hidden bottleneck

Evidence:?

Mitigations

A “double-edged sword” problem: loss curve evidence

Root Cause

?Fourier Features

领英推荐

Robust and Resilient Neural Network

Classical Manifold

Numerical Manifold Method

Reference

更多精彩文章

社区洞察

其他会员也浏览了

Unveiling the Magic: Understanding Neural Networks Like Never Before

Unlocking the Power of Graphs: The Rise of Graph Neural Networks

How to Master LLMs — Part 3 Understanding LSTMs: Making Machines Remember

A shape detection toy experiment for Convolutional Neural Networks

Physics-Informed Neural Networks (PINNs): A New Tool Solving Some of the World's Most Complex Problems

Neural Networks: A Comprehensive Visual Introduction

How to create a Neural Network

Convolutional Neural Networks: Financial Equity Markets

Unlocking the Power of Graph Neural Networks: Revolutionizing Structured Data Analysis

Chapter 2: Transformer architecture simplified: Neural Networks.

Three basic questions

The degree of freedom

The computation power

The boundary condition?

Hidden bottleneck

Evidence:?

Mitigations

A “double-edged sword” problem: loss curve evidence

Root Cause

?Fourier Features

领英推荐

Robust and Resilient Neural Network

Classical Manifold

Numerical Manifold Method

Reference

Foundation Model Enlightenment in Healthcare & Life Science

2023年4月26日

Overparameterization: my debate with GPT4

2023年4月13日

Healthy AI with why

2018年4月14日

社区洞察

其他会员也浏览了

Unveiling the Magic: Understanding Neural Networks Like Never Before

Unlocking the Power of Graphs: The Rise of Graph Neural Networks

How to Master LLMs — Part 3 Understanding LSTMs: Making Machines Remember

A shape detection toy experiment for Convolutional Neural Networks

Physics-Informed Neural Networks (PINNs): A New Tool Solving Some of the World's Most Complex Problems

Neural Networks: A Comprehensive Visual Introduction

How to create a Neural Network

Convolutional Neural Networks: Financial Equity Markets

Unlocking the Power of Graph Neural Networks: Revolutionizing Structured Data Analysis

Chapter 2: Transformer architecture simplified: Neural Networks.