An Open Question: Is Explainable Machine Learning attainable?
Michio Suginoo, CFA (He/Him)
CFA | Machine Learning | Data Science | Paradigm Shift | Technical Research Writer | Teleological Pursuit | UBA Postgraduate Student
Black box of Machine Learning has become a subject of daily criticism. In some cases, authorities have started demanding Machine Learning to be ‘Explainabile’ through their regulatory frameworks.
What is Explainable Machine Learning (EML)?
Is ‘explainability’ an attainable objective for Machine Learning/Deep Learning? Can Traditional Models claim solid explainability? Whether dealing with traditional models or Machine Learning models, our knowledge obtained from any model might be more tenuous than we would like to assume.
An Architectural Limitation
By its design, Machine Learning models lack explicit inductive and deductive formulations in discovering hidden relationships among the given dataset. The design was intentional. It was intended to address the limitations of traditional algorithm paradigm, which explicitly specifies rules regarding how to map independent variables (the Features) to dependent variable (the Label). As mapping tasks become increasingly complex, it has become progressively difficult, or even impossible, to explicitly predetermine rules to program under traditional algorithm paradigm. This limitation of the traditional algorithm paradigm set the stage for the emergence of Machine Learning, which does not require explicit inductive or deductive formulations (in mapping rules). (Suginoo, 2021)
Liberating itself from the task of pre-specifying mapping rules, in some cases Deep Learning applications have demonstrated transformative success in discovering complex relationships, which conventional algorithm paradigm have failed to discover. AlphaFold 2 is one of such successful cases. (Heaven, 2020)
Despite its success, there are blind spots in Deep Learning. While granting Deep Learning with the power to discover complex relationship in a given dataset, the architectural design makes it difficult for us to establish a valid scientific explanation regarding what’s going on during its process either inductive or deductive manner. In this sense, the limitation in the ‘scientific explainability’ might be one of its architectural limitation. Deep learning’s freedom from predetermining mapping rules shapes both its strength and its weakness.
By the way, what is “Valid Scientific Explanation”?
Establishing Scientific Explanation
In order for us to establish the scientific validity of any model, the model needs to satisfy the most fundamental criteria: ‘reproducibility’. (Nuzzo R. , 2015)
In practice, in an attempt to demonstrate the ‘explainability’ of their models, many developers of ‘Explainable’ Machine Learning (EML) present ‘post-hoc’ observations about the passages of their processes (either statistically or deterministically) during the development domain. And they present their interpretations about those ‘post-hoc’ observations. Well, interpretation is different from explanation.
Now, let’s contemplate ‘explainability’ scientifically in the context of ‘reproducibility’. To qualify a ‘scientifically explainable model’, a predictor generated by a Machine Learning model in the development domain, when introduced with a totally new dataset in the deployment domain, needs to yield an a-priori expected result, demonstrating an a-priori expected passage in the process (either statistically or deterministically). If there is no guarantee that a-priori expected passage(s) can be reproduced (either statistically or deterministically) with a totally untested new dataset, the predictor—which was once validated in the development domain—would not qualify to be ‘scientifically explainable’.
Deep Learning Models often demonstrate unexpected behaviours in the deployment domain. According to a famous paper issued by a group of Google Data Scientists--"Underspecification Presents Challenges for Credibility inModern Machine Learning"--Machine Learning's lack of inductive formulation in valuation mechanism is causing the problem of 'underspecification' and its consequence, model instability. (Google, 2020)
As of today, is there any scientifically qualified, reproducible, ‘Explainable’ Machine Learning (EML) model?This is one of my open questions.
Language Ambiguity: Explainability or Interpretability?
Although it is not impossible for us to observe and interpret the process, ‘explainability’ of Deep Learning appear tenuous, or even an unattainable dream, at least as of today. In addition, we have the issue of ‘reproducibility’ with Deep Learning.
Those called EML today might need to be renamed “Observable Machine Learning”, or “Interpretable Machine Learning”. And the question remains whether those models are reproducible to qualify ‘scientific’ before ‘explainable’. Overall, if the predictor in question does not demonstrate ‘reproducibility’, it would fail to qualify a scientific method: ‘explainablity’ should become ‘scientifically’ irrelevant.
Here is an insightful perspective by Cynthia Rudin: “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead”. (Rudin, 2019) Its title is self-explanatory about its message. Here are some excerpts:
“Let us stop calling approximations to black box model predictions (Machine Learning predictors) explanations.”
“Calling these “summaries of predictions,” “summary statistics,” or “trends” rather than “explanations” would be less misleading.”
“Since the definition of what constitutes a viable explanation is unclear, even strong regulations such as “right to explanation” can be undermined with less-than-satisfactory explanations.”
At least, we need to come to consensus regarding the definition of 'explainability' and 'interpretability' to avoid unnecessary confusions.
Business as Usual: Engineering Achievements without Scientific Consensus
In order to avoid confusion, I would like to clarify my position as an advocate of the notion that we explore the use of Machine Learning as an engineering tool, but not as a scientific method.
Appreciating the benefit of engineering achievements without scientific validity— however eccentric it may sound, we are surrounded by multiple of such cases. As an example, airplanes. There is no scientific consensus among scientists regarding an explanation why airplanes fly. (Regis, 2020) Despite the absence of scientific consensus in how it works, our civilization today embraces airplanes.
Tenuous Scientific Validity in Traditional Model Use cases
Furthermore, if ‘explainability’ of Deep Learning is questionable in the new algorithm paradigm space, our conventional models might be questionable more than we would like to assume from the perspective of “scientifical explainability”. One example can be observed in our frequent use of ‘p-value’. A British statistician, Ronald Fisher, the inventor of ‘p-value’, did not intend to promote its use as a scientific reproducible metric of significance.
Fisher "intended it simply as an informal way to judge whether evidence was significant in the old-fashioned sense: worthy of a second look."
"Fisher intended it to be just one part of a fluid, non-numerical process that blended data and background knowledge to lead to scientific conclusions." (Nuzzo, 2014)
Against the inventor's intention, there seem to have been abusive use cases of ‘p-value’ in practice—using it as a scientific reproducible metric of significance—more frequently than we would imagine. (Baker, 2016) In 2016, American Statistical Association publicly announced precaution in the use of ‘p-value’. (American Statistical Association, 2016)
Given that 'p-value' is embedded in 'confidence interval', how solid the reproducibility of 'Hypothesis Testing' would be?
If the scientific explanation of Deep Learning, an example of new model paradigm, is tenuous, we would also have to maintain our vigilance on the scientific validity of traditional model paradigm as well.
Now, let’s return to the famous mantra by George Box, a British Statistician in the 20th century (Unknown, N.D.):
“All models are wrong, but some are useful”.
Our civilization might be driven by wrong but useful tools to some extent. Whether it is good or not is another question. Nevertheless, that would definitely embed some inherent risk within the construct of our civilization.
Regulatory Implication
Is ‘explainability’ an unattainable dream for Machine Learning by its design?
More critically, should Machine Learning’s “explainability” be an unattainable dream, when a government demands ‘explainability’ in the use of Machine Learning, would the government policy only gives participants an incentive to deceptively game the term? If that is the case, authorities should refrain from requiring something unattainable.
Vigilance required
Whether dealing with conventional model paradigm or Machine Learning models, we need to maintain our vigilance when we question about model ‘explainability’ scientifically. Our knowledge obtained from any model might be more tenuous than we would like to assume.
Let me close my note with another reminder allegedly credited to Stephen Hawking:
“(the worst fallacy of the human nature) is not our ignorance, but our delusion of knowledge”