登录查看更多内容

How to Explain Deep Learning using Chaos and Complexity

Carlos E. Perez

Author The Deep Learning Playbook, Artificial Intuition, Fluency & Empathy, A Pattern Language for Generative AI and Long Reasoning AI

发布日期: 2017年1月21日

I want to talk to you today about the concerns of Non-Equilibrium Information Dynamics and how an understanding its features lead us to a better intuition about Deep Learning systems or learning systems in general.

Just to recap my observation from a previous post on “Deep Learning in Non-Equilibrium Dynamics”. In our study of Deep Learning, practitioners derive their intuition from the mathematics of physical systems. However, since these are not physical system that we study but rather information system, we apply information theoretic principles. Now, information theory has its origins also in mathematics that describe physics (i.e. Thermodynamics). Both theories are essentially bulk observations of nature. What I mean by bulk, is that they are aggregate measure of systems with a large number of interacting particles or entities.

Kieran D. Kelly, who’s writing I just stumbled upon lately, has one of the better intuitions out there about non-equilibrium dynamics. His blog is a pleasure to read, and I recommend it highly for anyone interested in this kind of esoteric thing.

Wired has posted an article the other day titled “Move Over Coders?—?Physicists will soon Rule Silicon Valley”. Now, we might make the observation that Physicists in general have to have a decent IQ to do what they do and thus be able to handle computer science. We can also argue that the mathematics found in Deep Learning isn’t really that advanced compared to what’s found in a typical undergraduate physics curriculum (emphasis on undergraduate). However, there is something else that most people do not understand but it is generally understood by someone studying physics.

What people can’t seem to comprehend, and this is even among folks with technical background like computer science and mathematics, is the relationship with math and reality. They don’t recognize that the math that we use are just approximations of reality. That the math has serious limitations beyond certain dimensions. People doing physics know this because despite using analytic forms, we are constantly performing hand waving approximations (i.e. Use Taylor series to expand any function and throw out any term beyond the quadratic). So when I write about the limits of Math with respect to AI, I get a ton of outrage from math inclined folk! The ignorance in this world, even among the learned, is really surprising.

Going back to Kelly, he echoes the same sentiment about math and reality:

Physics is, in a sense, a science of linear dynamics, a science of “dynamics without feedback”; such dynamics are indeed easily compressible, but the real world is a world that abounds with feedback, a “nonlinear” world full of “incompressible dynamics”.

For many, this statement may seem to be a shock. But it really is not, this is just basic reality that there are limits to analytic forms. Another thing that seems to confuse people is the use of the word “linear” and “non-linear” by Physicists. Most people think of “linear” being that of a linear equation and I guess non-linear to mean something that’s not. So a quadratic equation qualifies as non-linear. What the Physicist however defines as linear and non-linear is from the point of view of differential equations. Linear differential equation have a chance of being solvable in a closed form solution. In contrast, with non-linear differential equations, almost all bets are of. The most classic example is the Navier-Stokes equation for fluids. Solvable analytically only up to 2 dimensions. Yes, 2 dimensions, that is an unrealistic flat-land world.

Basically though, think of non-linear as systems that have feedback. In other words, most of our reality. So to understand a bit about our reality, we have to understand a bit about the nature of non-linearity. It turns out over the years, there has been two features about feedback systems that have been studied. This is chaos and complexity. Kelly has a whole set of articles about these two subjects, and I’ll re-direct you there to get an introduction.

Now what I want to focus on is information systems (not physical systems), so what we are really looking for is chaos and complexity in the context of information systems. (side note: Deep Learning systems are information systems despite the poor association with the term Neural Networks). So here’s the very nice table from Kelly:

Source: https://www.kierandkelly.com/what-is-complexity/

Kelly writes:

What drives evolution’s spontaneous and progressive complexity is the interplay of insufficient negative feedback and strong positive feedback; or in other words what drives evolution is The Interplay of Random Innovation and Natural Reinforcement.

Negative feedback here are the natural tendency that exists in the Second Law of Thermodynamics (which really is the law of large numbers). That is, systems tend towards maximum entropy. The positive feedback however is a mechanism that can lead to chaos. But at the upper right quadrant, we discover emergent complexity. In other words, one has to embrace the existence of mutual feedback as well as randomness. Unfortunately, our mathematical legacy, that of assuming nice independent Gaussian distributions and favoring sparsity (or parsimony) over randomness is demanding an unnatural constraint on the system.

An assumption of IID (i.e. Independent Identical Distributed) features and an assumption that sparsity is the favored solution is walking every researcher towards an entirely wrong direction! These assumptions are the equivalent of physicists making their equations linear. It is all so that our mathematics become convenient. Unfortunately, God did not mandate that reality be conveniently expressed in mathematics. We are pushing our researchers to buy into religion and not reality.

Now, before I completely forget, let me explain how chaos and complexity relate to explaining Deep Learning. Let’s start with randomness or entropy, I wrote about this in “The Unreasonable Effectiveness of Randomness”. When we study Deep Learning, we simply can’t ignore the presence of randomness. It just seems to be an intrinsic feature of these systems. The most simple intuition I can think of here is that, diversity leads to surviveability. Monocultures, tends to less adaptability and possible extinction. In fact, the most counter-intuitive notion, randomness leads to information preservation. As an example of this in computer science, this is used in “Information Dispersal Algorithms”. That is, you take information and scatter it among different storage nodes and in a massive scale you do it randomly. You basically build storage that is highly redundant. This is the same mechanism as you find in holographic memories. So here, we establish the value of high entropy.

Let’s examine the other axis, that of high mutual information that can lead to unstable feedback and thus chaos. Mutual Information is the anti-thesis of many probabilistic methods. That’s because the math simply can’t handle it. But should we shoehorn reality to fit the math? I think not. One of the better characterization of how Deep Learning is able to work well in domains of higher mutual information is this paper “Critical Behavior from Deep Dynamics: A Hidden Dimension in Natural Language”:

Source: https://arxiv.org/abs/1606.06737v2

How can we know when machines are bad or good? The old answer is to compute the loss function. The new answer is to also compute the mutual information as a function of separation, which can immediately show how well the model is doing at capturing correlations on different scales.

Deep Learning must be able to learn correlations at multiple scales to be of any use. Actually, to phrase it in a different way that does make sense is, Deep Learning must be able to understand the composition of language, from letters to word, to sentences and eventually to complete texts. Deep learning works because it captures language.

And the learning mechanism for this is what exactly? Jeremy England actually has very compelling argument as to how life self organizes. You can read it at Quanta: “A New Physics Theory of Life”. We can take this idea and use it to explain how learning works in Deep Learning. I’ve written early about the 3 Ilities. Explanations of “Trainability” is extremely important. A layered DL system actually builds a representation of language from the lower layers up to the more abstract higher layers. Each layer has its own mutual entanglement that is actually discovered through training. Over time, the entanglement get reinforced such that the breaking of the entanglement becomes less likely. So, for example, if the network only sees Latin characters then it never develops the ability to understand Arabic characters. Layers are also interconnected, so there is a constraint at the bottom ( more fundamental concepts ) and at the top ( minimizing relative entropy ). So eventually, a language hierarchy is built.

The objection here though is that it should take an infinite amount of time to arrive at a proper representation. That’s where the interplay of entropy comes into the picture. The basic theory is not unlike that of the holographic principle. Randomness begets robustness while mutual information begets self organization and compression. What begets generalization? Not sure, but something seems to emerge at the upper right hand quadrant!

To understand more, either keep reading this blog or head over and talk to us at “Intuition Machine”.

Harry Jones

consultant

8 年

Looks like fun, but it smacks of meddling with the subconscious instead of just using it.

1 次回应

Menno Mafait

Thinknowlogy is the world's only naturally intelligent knowledge technology, based on Laws of Intelligence that are naturally found in the human language. Open souce software.

8 年

Carlos E. Perez, you assume the theory of evolution is scientific. However, it is a religion: Any Code of Conduct on scientific research endorses: Science is observable, testable, repeatable and falsifiable. However, none of the claimed phenomena of the theory of evolution – and its derivative theories – is ever observed. Examples: ? String theory, the assumed strings can't be observed by definition; ? Dark energy theory, the assumed dark energy can't be observed by definition; ? Dark matter theory, the assumed dark matter can't be observed by definition; ? Black hole theory, the assumed black holes can't be observed by definition. The origin of the observed phenomenon can only be assumed; ? Multiverse theory, the assumed “other universes” can't be observed by definition; ? Oort cloud theory, the assumed Oort cloud isn't located, and can't therefore be observed; ? Extraterrestrial life theory, the assumed extraterrestrial life isn't located, and can't therefore be observed; ? Inflation theory, the assumed inflaton particles can't be observed; ? Macro-evolution theory or Transition of Kinds theory, the assumed transitions (rock minerals →microbes →vegetation →animals →humans) can't be observed due to the assumed hundreds of millions of years that it could take before another transition occurs. And the phenomena that are claimed to be detected – like the Higgs boson and gravitational waves – are debunked. Science is about observable phenomena, while religion is about unobservable phenomena. Not being observable – and therefore not testable, not repeatable and not falsifiable – the theory of evolution and its derivative theories are not scientific. Instead it is a religion. Here's a list of research on physics that is accepted more and more. It is based on observed phenomena instead of theories. But it is in direct conflict with the current view on physics: ? Primer fields. Part 1: https://youtube.com/watch?v=9EPlyiW-xGI ? Electric universe: https://electricuniverse.info ? The Thunderbolts project: https://thunderbolts.info ? Plasma universe: https://plasma-universe.com ? Plasma cosmology: https://en.wikipedia.org/wiki/Plasma_cosmology ? The SAFIRE project tries to replicate an electric sun in a lab: https://everythingselectric.com/safire-project/

2 次回应

Andrey Cheremskuy

Senior Consultant at Deloitte

8 年

Carlos, As always this is an article worth reading. The axis names for the graph taken from Henry W. Lin and Max Tegmark paper are cut.

查看更多评论

要查看或添加评论，请登录

Carlos E. Perez的更多文章

Deep Learning AI Canvas and Viral Product Development

2017年10月13日

Deep Learning AI Canvas and Viral Product Development

In the breakneck Deep Learning research field, the obsession is to dish out innovative research as quickly as possible…

1 条评论
Introducing the Deep Learning AI Canvas

2017年10月13日

Introducing the Deep Learning AI Canvas

One of the big mysteries of Deep Learning is, how do we apply this disruptive new AI technology to improving our…
How to Re-Invent the Cognitive Stack of Intelligence

2017年5月3日

How to Re-Invent the Cognitive Stack of Intelligence

Kevin Kelly (founding editor of Wired magazine) just wrote a near disaster of an article “The AI Cargo Cult: The Myth…
The AI Economy is Reserved for the Highly?Skilled

2017年4月24日

The AI Economy is Reserved for the Highly?Skilled

Thomas Frey has a thought provoking article “78 Skills that are Difficult to Automate”. Frey breaks down the categories…
The Deep Learning Roadmap

2017年4月24日

The Deep Learning Roadmap

It just occurred to me, that after a couple of years tracking Deep Learning developments, that nobody has even bothered…

1 条评论
Deep Learning Unknowable Knowns

2017年4月24日

Deep Learning Unknowable Knowns

One good way to frame the question of the limits of Deep Learning is in the context of the Principle of Computational…

1 条评论
Infographic: Best Practices for Training Deep Learning Networks

2017年2月22日

Infographic: Best Practices for Training Deep Learning Networks

More on this topic and how you can introduce Deep Learning into your business can be found in an upcoming book: "Deep…

3 条评论
The Alien Style of Deep Learning Generative Design

2017年2月20日

The Alien Style of Deep Learning Generative Design

What happens when you have Deep Learning begin to generate your designs? The commons misconception would be that a…

8 条评论
The Many Tribes of Artificial Intelligence (AI)

2017年1月15日

The Many Tribes of Artificial Intelligence (AI)

One of the biggest confusions about “Artificial Intelligence” is that it is a very vague term. That’s because…

27 条评论
Deep Learning, Disruption and the Platformization of?Business

2017年1月15日

Deep Learning, Disruption and the Platformization of?Business

The business world has evolved into a much more difficult and competitive environment. This situation has been…

See all articles

How to Explain Deep Learning using Chaos and Complexity

Carlos E. Perez

Author The Deep Learning Playbook, Artificial Intuition, Fluency & Empathy, A Pattern Language for Generative AI and Long Reasoning AI

Carlos E. Perez的更多文章

社区洞察

其他会员也浏览了

Deep Learning Resources and Study Path For Aspiring Data Scientist

Deep Learning: Earth Science's Crystal Ball

My Review on Deep learning Book "The Deep Learning with Keras Workshop"

Keras: Training on Large Datasets That Don’t Fit In Memory

Deep Learning: GANs and Variational Autoencoders training

Supercharge Your Analytics Career - Become a Deep Learning Master

TensorFlow - A Complete Introduction to a Deep Learning Framework

Motivation for Integrating Symbolic Mathematics with Deep Learning

Deep Learning Reading List: The Essentials

Carlos E. Perez的更多文章

Deep Learning AI Canvas and Viral Product Development

Introducing the Deep Learning AI Canvas

How to Re-Invent the Cognitive Stack of Intelligence

The AI Economy is Reserved for the Highly?Skilled

The Deep Learning Roadmap

Deep Learning Unknowable Knowns

Infographic: Best Practices for Training Deep Learning Networks

The Alien Style of Deep Learning Generative Design

The Many Tribes of Artificial Intelligence (AI)

Deep Learning, Disruption and the Platformization of?Business

社区洞察

其他会员也浏览了

Deep Learning Resources and Study Path For Aspiring Data Scientist

Deep Learning: Earth Science's Crystal Ball

My Review on Deep learning Book "The Deep Learning with Keras Workshop"

Keras: Training on Large Datasets That Don’t Fit In Memory

Deep Learning: GANs and Variational Autoencoders training

Supercharge Your Analytics Career - Become a Deep Learning Master

TensorFlow - A Complete Introduction to a Deep Learning Framework

Motivation for Integrating Symbolic Mathematics with Deep Learning

Deep Learning Reading List: The Essentials