Future of Machine Learning and AI . . . the BIG opportunities.
Dr. PG Madhavan
Digital Twin maker: Causality & Data Science --> TwinARC - the "INSIGHT Digital Twin"!
[The views expressed in this article are that of the author alone and not of his employer]
LEARNING is a fundamental building block of ML & AI. But do we know what learning really is, what the different types are and how they are interrelated?
A characteristic of learning is the diminution of work required to process inputs as learning proceeds. We will habituate to a repeated auditory tone quickly but a surprising new sound will require more processing – this is a common human experience. Beyond this low-level learning example, we can agree that quicker processing of related information is a concomitant of ALL learning – it may even be a hallmark of learning! Lots of processing work means lots of learning effort . . . followed by less work indicating the impending completion of as much learning as possible for that task. This transition from high to low effort is IMPORTANT in this external measure of learning.
Initial learning from inputs can be thought of as “clustering” – to higher and higher levels. At the highest levels, Clusters are “categories” with “meanings” related to what the cluster corresponds to in the physical world. As higher levels of clusters are formed, phase transitions can be expected to occur, which may give rise to unanticipated emergent properties (or “meanings”) to higher-level clusters. In the human brain, we have hardly any reliable information on how these higher-level learning (or clustering) occurs due to the impossibility of performing experiments on live human beings!
“A Cluster is a group of objects that have high intra-class similarity and low inter-class similarity. In other words, objects are similar to other objects in the same cluster, but dissimilar to objects in other clusters.” From this definition of a cluster, one can see that “minimization of work” principle is at play in cluster learning – “distances” to nearby objects which need to be traversed often are less than to dissimilar and presumably less often occurring (in that context) objects. This insight may provide some clues for neurophysiologists in their hunt of substrates of learning in the brain but it does not necessarily tells us how to find new learning *methods* for machine learning purposes!
Any computer program that modifies its operations iteratively in runtime towards an objective can be considered a “learning machine”. Of the many antecedents, let us consider two main ones: (1) Adaptive filtering and (2) Artificial Intelligence. It is fair to say that the former arose from Optimization theory and the latter from a mix of Cybernetics, Pattern Recognition, etc. The impetus for Artificial Intelligence has always been the nirvana state of Artificial GENERAL Intelligence (“AGI”) that mimics Human Intelligence; but AGI has been next decade’s killer solution for many decades now! It will continue to be so in my opinion.
However, even if our understanding of Human Intelligence is woefully inadequate, as scientific beings, we ought to give it a shot based on current knowledge; some thoughts on AGI at the end of this paper . . .
In the following sections, I take a broader view of Machine Learning and go beyond basic unsupervised and supervised learning algorithms. While we do not understand higher learning “algorithms” in the human brain to replicate in a machine, engineers can still construct higher-level learning methods based on other scientific disciplines. I see some BIG opportunities here . . .
The approach is to go from model-free to model-based learning. There is a long history of model-based methods in all branches of Science and Engineering (for example, spectrum estimation); model-based methods appear later in research roadmaps and invariably produce better results when the model is appropriate.
Dynamical Machine Learning
DYNAMICAL Machine Learning (DML) is a formal framework for “continuous learning”. Current machine learning (*static* ML) is Learn Once and Use for Ever” (or “LOUE”, as I like to call it).
While the best way to capture a moving scene is by “video” (which is DML), the current static ML (or “LOUE”) takes still pictures (top right)! Clearly, anything that moves (or that is dynamic) is blurred. In certain applications, a “video frame” (bottom left) can be used instead to get better results.
In current ML, multiple linear regression model is the work-horse.
y = a0 + a1 x1 + a2 x2 + . . . + aM xM + w --> Static ML
Start using State-Space model and dynamics get built right in.
s[n] = A s[n-1] + B x[n] + D q[n-1]
y[n] = H[n] s[n] + r[n] --> Dynamical ML
This is still a “model-free” approach since the physical model of the underlying system is not incorporated faithfully but abstracted into the “state equation”. However, State-Space model is the first step to incorporating “video” into ML.
We can use “Bayes Filter” algorithms to estimate E[s | y, x] WITHOUT explicitly obtaining Conditional pdf. Bayesian estimate of the conditional expectation, E[y | x] = H[n] s[n] where H[n] is known.
The most celebrated of these algorithms is Kalman Filter.
Bayes Filter algorithms:
1. Linear Gaussian case – Kalman Filter.
2. Mild Non-linear Gaussian case – Extended Kalman Filter (EKF).
3. Non-linear Gaussian case – Cubature Kalman Filter (CKF), Unscented Kalman Filter (UKF).
4. Non-linear distribution-free case – Particle Filter, Markov Chain Monte Carlo (MCMC) Filter.
Full description of Dynamical Machine Learning methods including optimized Kalman Filter algorithm are available in Systems Analytics (2016).
Causality Analysis Learning
The admonition, “correlation is not causation”, rings in our ears. While it is interesting to know what is correlated to what, what we really want to know is what *caused* what. Why do we want to know causation? The expectation is that with causal knowledge we can control the outcome variable by manipulating the causal variable. This is the true basis of “Prescriptive” Analytics. The major promise of Data Science in business is this ability to “turn knobs” of causal variables and achieve expected business outcomes.
All current ML methods are largely based on correlation one way of the other. When we minimize any sort of least squares curve-fitting errors, we explicitly or implicitly use the so called “normal equations” for the optimal solution (under some assumptions) that involve correlations among some set of input variables and “cross-correlations” between them and the output variable.
Judea Pearl at UCLA has been driving “Causality Analysis” virtually single-handedly for the last two decades. Recently, he has been recognized for his pioneering efforts in this area by the Turing Award, the “Nobel Prize” of Computer Science. His lifelong work is nicely summarized in his recent book,
“The Book of Why: The New Science of Cause and Effect” (2018).
Causality Analysis is a clear example of “model based” learning. Given coupled measurements of a set of variables, nothing much beyond correlations among them can be studied. The moment you have a physical basis for a model that relates these variables, you can use the coupled measurements to “validate” your assumed model. In certain clinical studies, Pearl’s approach provides an alternative to Randomized Control Studies (the “cause-effect” test of efficacy of drugs) when your “causal model” can be constructed from a small set of variables. Pearl and his cohorts have demonstrated the use of his methods in such medical and social research studies.
Extending this method to models with hundreds or thousands of factors in modern business applications has been difficult. Causality Analysis methodology developed and extended by Prof. Shohei Shimizu and later by NEC Labs have democratized this process. The fundamental idea is as follows: the closer the “Causal Graph” model we create is to reality, the better the optimized Causal Network that drives Prescriptive Analytics. A prior approach was to use human knowledge (a la “expert systems”) to find the graph through “Structural Equation Modeling”. NEC Labs approach derives the Causal Graph optimally from measured data – this has the advantage that Causal Graph is not tainted by human input biases. Various information theoretic criteria are used to iteratively optimize the graph. Once optimized, the “knobs” embedded in the Causal Graph can be tweaked to predict what effects will be caused by changes in the variables.
Simulation-based Learning
Once the study of Complex Systems matured in 1990’s, it became clear that many large-scale systems (global economy, human immunity, weather, etc.) do not lend themselves to a set of coupled differential equations; due to closed loops and non-linearities, they give rise to chaos and limit-cycle behavior that are essentially unpredictable.
With the increasing computing power, it was already becoming clear that one way forward is through computer modeling. Basic simulation methods are Physical, Discrete-event & Stochastic. By 2018, there is a wealth of such fine-grained simulations (for example, weather) which has proven effective, at least for medium-term forecasts.
I consider Simulation as a “model-based” learning method. Typically, they can provide “what-if” questions and answers for a set of measured variables connected together in a simulation model. The level of detail and verisimilitude to the necessary parts of reality of the simulation model will dictate the quality of “what-if” simulation results. These results drive Prescriptive Analytics in the model’s domain.
Back in 2000, NASA kicked off Numerical Propulsion System Simulation or “NPSS”. Objective was stated as flows: “The analysis is currently focused on large-scale modeling of complete aircraft engines. This will provide the product developer with a "'virtual wind tunnel" that will reduce the number of hardware builds and tests required during the development of advanced aerospace propulsion systems.”
In subsequent years, every jet engine manufacturer (such as GE and Rolls Royce) and many other independent parties (research institutes and universities) became part of this open source effort. Now NPSS (“gas path“) model of a jet engine is used by manufacturers for ML-based fault detection by proprietary methods which has extended NPSS to suit their “gas path” combined with deep-learning.
Considering a generic example, let us explore how NPSS simulation model and ML can be combined to aid machine learning.
In traditional Systems Theory, we use the State Space data model mentioned in the DYNAMICAL Machine Learning section earlier. An additional model called “Luenbeger Observer” is added to “observe” the states of the system while in operation. In Simulation-based Learning, NPSS external model is combined with the physical system measurements and Luenberger Observer. This then is a systematic way of incorporating simulation into machine learning.
The Natural System in the aviation example is the jet engine. There are many 100’s of devices that measure the inputs and outputs of an actual jet engine while on the ground and in flight.
NPSS software is the middle block which is the gas path model that is developed from basic thermodynamics and physics equations and cast into a software simulation package. Given the inputs that the jet engine sees, NPSS is supposed to produce similar outputs as the real jet engine. The difference or the “error” between the jet engine output sensor readings and NPSS output readings are fed back to the Observer block. The “states” of the Observer is dynamically estimated using Kalman Filter – these are related to the NPSS model parameters; they are continually updated so that jet engine outputs and NPSS outputs track each other closely. Now this specific instance of NPSS model is a close approximation of a specific jet engine!
The value of Simulation-based Learning in this case is that we have software models tuned to each jet engine (in some cases, a model of all jet engines is required – then, this is NOT the approach). Now variations in Observer states can be used to sensitively identify (1) faults in the jet engine, (2) with experience predict them or (3) run the tuned-NPSS model to answer “what-if” questions (such as “how long will this fan blade last?”) by running the NPSS model long into the future.
A subset of Simulation-based Learning has been popular over the last 5 years under “Digital Twin” name. In some cases, Digital Twin may just be a compelling technique to display measurements but more advanced versions which incorporate some or all of the quantitative analysis methods discussed in this section will make it a true digital twin.
Towards AGI . . .
I am not optimistic that Artificial General Intelligence (AGI) is achievable (see my 2016 blog, “Scary "A.I."?”, for more) . . . but how would we even approach AGI as an “engineering problem”?
The best way I have found is to borrow Daniel Dunnett’s formulation of how intelligent design emerged on this earth. The reference here is to his book, “From Bacteria to Bach and Back: The Evolution of Minds”. Dunnett is the preeminent philosopher-scientist of our time. The book is dense but here is a terse summary.
Two evolutions are in play in us developing intelligent design capabilities – biological and cultural.
Genes drive the biological evolution: “Evolution by natural selection is the change in a population due to (i) variation in the characteristics of members of the population, (ii) which causes different rates of reproduction, and (iii) which is heritable.”
Memes drive the cultural evolution: Meme is “a way of behaving that can be copied, transmitted, shunned, denounced, . . . such as wearing a baseball cap backwards”; words and pictures are more common examples. Memes evolve, some catch on and spread.
There are wildly differing timescales for each of these two evolutions – the foundation is the biological evolution, then the cultural evolution got layered on in some species, language developed in humans and lateralized the brain; Comprehension arose during this time and it enables humans to execute “intelligent design” (how Comprehension arose in a second . . .). There . . . the whole history of life on Earth! ??
One factor that is common to life from bacteria to Bach is Competence. Even an amoeba has enough competence to get energy, reproduce, move about in *its environment*. All living forms display this “Competence without Comprehension”. This is the level of today’s Machine Learning . . .
How did Comprehension come about in humans? This is intricately related to language development, memes and cultural evolution. Just like natural selection in biological evolution, memes and “bottom-up” purposeless cultural variations yielded some golden nuggets during cultural evolution and they flourished – such as the ability to describe things that are not in front of us; language allowed humans to transcend time and space and opened up the window for myth-making, religion, exploring the mind-scape, etc.
Language also enables us to turn our attention to our own thoughts and develop them deliberately in the kind of top-down creativity characteristic of science, art, technology, and institutional design – what we sometimes call “intelligence”. Such *intelligent design* is the hallmark of Comprehension.
Intelligence is a loaded word but consider it as “Comprehension”. As Dunnett argues, the rise of Comprehension requires cultural evolution by memes that mutate, spread and reproduce purposelessly; I would add that this meme activity also modified the “wet-ware” so that Comprehension was accelerated by lateralization and specialization of brain regions. This must have been accompanied by many learning strategies beyond simple Hebbian learning . . . whether this was the result of bottom-up, purposeless variations or intelligent design, we do not know.
Now that humans have evolved the capacity for “intelligent design”, most of the evolutionary steps – both biological and cultural - in Comprehension can be short-circuited. How can that speed up the intelligent design of AGI?
At least today in 2018, Internet memes do not have the ability to influence software directly - much less affect the underlying electronics! However, bots that communicate directly, learn and modify themselves and neuromorphic computing hardware elements such as memistors that can be modified by software can take us to the edge of the AGI slippery slope. If these bots acquire the ability, just like humans did, to perform “intelligent design”, anything is possible!
By the way, today’s “deep learning” experts may claim that Deep Neural Network does all the things that are described in the last few paragraphs! Cat pictures are the memes, they spread in the neural network and weight adaptation is the analog of wet-ware modification . . . but then, there is no Comprehension! Remember, the hallmark of Comprehension is intelligent design (sorry, “generalization” ability does not count as intelligent design!). Startups with neuromorphic chips (which mimic the human connectome) also do not go beyond Competence - that is “Machine Learning” (which is amazingly useful but it is just not “general intelligence”!).
Recipe for Intelligent Design-AGI solutions:
· Create software bots that communicate in a specific domain, learn and modify themselves.
· When the bot wants to modify the underlying hardware, it gets permission from the host. Host is the human being who can pull the physical plug! It is possible that the underlying hardware is (memistor-type) analog rather than digital.
· By *directed* evolution by the programmer, bots develop into systems that perform top-down intelligent design of special-purpose solutions - for chess-playing, airline routing, dinner menu planning, . . .
I expect the solutions created by these “ID-AGI” bots to be superior to the so-called “AI” solutions of 2018, at least in special-purpose narrow verticals.
In this article, I have focused on three types of learning for ML and AI that are currently NOT the center of activity. Dynamical ML is but a natural extension of the data model – except in engineering, multiple regression data models are so common that State Space model is rarely exploited as an “advanced” data model. As that changes, Dynamical ML will take off since all the tools are already in place for optimal estimation such as Kalman Filter in the linear Gaussian case and Particle Filters for nonlinear, non-Gaussian applications.
Causality Analysis is ready for a major hype-cycle! Causality Analysis is a clear case of what model-based methods can do for you. There is a certain ad hoc-ness in the way Causal Graphs are generated from data – I expect to see rapid progress in this area.
Simulation had its glory day back in the 1980’s when Complex Systems and Chaos were in peoples’ minds. Now, with even more awesome computing power and the ability to address “what-if” scenarios in the near and far future, the interest is bound to grow again. Microsoft has just recently released “AirSim”, an open-source simulation for autonomous vehicles, much like NPSS for jet engines. I see rapid growth in Simulation-based Learning to solve many hard ML and AI problems.
Once you are open to the concept of "model-based" learning, a whole host of models will appear that are inspired by various verticals. Model-based ML & AI will take us far in the coming decade.
About the author:
Dr. PG Madhavan is the CXO of NEC X, Inc. After obtaining his Ph.D. in Electrical and Computer Engineering from McMaster University, Canada, and Masters in Biomedical Engineering from IIT, Madras, he pursued original research in Random Field Theory and Computational Neuroscience as a professor at University of Michigan, Ann Arbor, and Waterloo University, Canada, among others. His next career in corporate technology saw him assume product leadership roles at Microsoft, Bell Labs, Rockwell Automation and GE Aviation. PG has founded and was CEO of 2 startups (and CTO at 2 others) leading all aspects of startup life.
His recent major contribution in Data Science is the creation of “Systems Analytics”, a blend of Systems Theory and Machine Learning (book with the same title published in 2016; https://www.amazon.com/dp/1535541520/) providing a pathway to formally incorporate “dynamics’ into Machine Learning.
?
Environmental Engineer
5 年Any useful insights from Mahayana Buddhism? Eg. Nagarjuna Mula Madhyamika Karika Tetralemma: "Nothing arise by itself nor another or both or without cause"
Software & Professional Services Sales Executive – AI, ML, Generative AI, IoT, MES, ERP, Analytics
5 年I am impressed with Dr. PG Madhavan's article and I feel that solving for Causality will help drive much more rapid adoption of artificial intelligence in businesses
Great article..
VP/CTO FinTech|eCom|AI|SAAS|AdTech|Enterprise
5 年Good read for the end of year 2018. The simulation eg is similar to the control systems. Adaptive learning or online learning or reenforcement learning already handles the dynamic ML part. I believe the tranfer learning is also another area to rapidly train similar models or in a given domain to minimize the training latency of systems.