Artificial Intelligence No 52: An introduction to causal machine learning
UPDATE
Thanks again for the response and insightful?comments to this article
Like some of you said, PEARL is not the only person formulating the maths behind causality - but for me, PEARL is the most mature esp in terms of implementation
To summarise, there are three stages to building a causal model
a)?Causal discovery (via data / surveys or understanding the distribution in some way)
b)?Causal model building (ex creation of the DAGs)
c)?Causal inference ex via the Do Operator
In terms of deployment, it's no different from any other model (that's the easy part)
which libraries to use is also rapidly evolving
At Oxford, we are exploring TF Probability and?pgmpy
If you are using others and recommend them please let me know
original article follows
I have been a fan of causal machine learning, and I believe it will increasingly impact AI / ML techniques.
In this post, I explain the basics of causal machine learning. The posts are based on three links from Shawhin Talebi, which I found very useful in explaining these concepts.
Causality is not familiar to most machine learning developers. Causality is based on the work of Judea Pearl (‘the new science of cause and effect’ and ‘the book of why’).
The ideas themselves have been around for a while but are making a bit of a comeback in the recognition that current machine learning and deep learning techniques do not address a class of problems (cause and effect problems)
Causality is concerned with the question of 'Why.'
There are many ways/stories to explain why something has happened and its related questions: what is the reason for a phenomenon, where is this phenomenon going next, etc
We can also think of causality in terms of the limits of current statistical thinking. These include:
1) Spurious co-relation (correlation does not cause causation)
2) Simpson’s paradox: i.e., the same data gives contradictory conclusions depending on how you look at it
3) Symmetry: i.e., A causes B does not imply that B causes A
?Thus, ?Causality?goes beyond correlation. Causality describes the cause and effect of elements in a system. A variable, X, can be said to cause another variable Y if an intervention in X results in a change in Y, but an intervention in Y does not necessarily result in a change in X(and if when all confounders are adjusted).
In contrast, for correlations, if X correlates with Y, Y correlates with X and also if X causes Y, Y may not cause X.
Note that in statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation.
领英推荐
Causality is expressed as a?Directed Acyclic Graphs (DAG)?and?also as Structural Equation Model (SEM).
A DAG is?a special kind of graph?for which all edges are directed (information flow is in one direction) and no cycles exist (information that leaves a vertex cannot return to it). The vertices (circles) in a causal DAG represent variables and edges (arrows) represent causation, where a variable is directly?caused?by its parents.
SEMs?represent relationships between variables having two characteristics. SEM equations are asymmetric meaning equality only works in one direction. Hence, SEMs cannot be inverted. Second, equations can be non-parametric meaning the functional form is not known.
Image source: shawhin talebi
Causal Inference: Now that we have formulated the causal structure (model) of the problem as a DAG/SEM, we look at causal inference which uses the causal structure to answer causal questions such as:
Causal inference is based on estimating causal effects based on a technique called the do-calculus/ do-operator.
In simple terms, as per the idea of do-calculus, X causes Y if an intervention in X results in a change in Y, while an intervention in Y does not necessarily result in a change in X. ?
Thus, the?do-operator?is a?mathematical representation of a physical intervention. The power of do-operator is that it?allows us to simulate experiments, given we know the details of the causal connections. For example, suppose we want to ask, will increasing the marketing budget boost sales?? With a causal model, we can simulate what would happen?if we were to increase marketing spend.In other words, we can evaluate the?causal effect?of marketing on sales.
While causal inference is useful, how do we build a causal model?
For that, we need causal discovery.
Causal discovery?aims to?infer causal structure from data. In other words, given a dataset,?derive?a causal model that describes it.
There are four common assumptions for data for causal discovery algorithms. (reference: Structural Agnostic Modeling: Adversarial Learning of Causal Graphs)
1.????Acyclicity?— causal structure can be represented by DAG (G)
2.????Markov Property?— all nodes are independent of their non-descendants when conditioned on their parents
3.????Faithfulness?— all conditional independences in true underlying distribution?p?are represented in G
4.????Sufficiency?— any pair of nodes in G has no common external cause
If all this sounds a bit abstract, it is!
However, much of it is being implemented in python libraries like the Causal Discovery Toolbox, One problem currently is that this is a fairly new field and the libraries are not mature. So, if you are working in this field and have some recommendations, I would welcome them.
Sources the following three posts?by shawhin talebi
UTRINQUE PARATUS . Periodt
2 年are those blue tadpoles? 1 inside and 1 outside the circle?
Data Scientist | AI | ML Engineer | MBA
2 年It is a clear article
In this newsletter Shawhin Talebi Thomas Toseland
SensAE are better than IoT projects; mature with connection, communication, contextualization, collaboration, causation, conceptualization and cognition into Sensor Analytics Ecosystems
2 年When you say “…been around for awhile…” I was first introduced to Dr. Pearl’s work around 1988, when, as the only Bayesian at Palo Alto Research Labs, I was asked to help the AI group understand a new concept: Bayesian Neural Networks. And then winter came ????♂?
Head of Data Science
2 年D-separation plays a huge role in isolating causal influence from associational influence in causal graphs. Pearls do-calculus is designed to provide sufficient conditions to isolate causal effects where possible. One examples would be front door adjustment and another the back door adjustment. For a more formal treatment of Pearls atomic intervention concept I recommend Shachter and Heckerman paper on “causal influence diagrams” which formalises the atomic intervention through the introduction of responsiveness/limited responsiveness.