Causality for Engineers

Causality for Engineers

Here is a “pragmatic” explanation of Causality – I aim to explain “concepts” accurately at the expense of (sometimes unnecessary!) mathematical rigor. If Causality were a topic in my course today for Senior Engineering Undergrads, this will be my first lecture introducing the subject . . .

Where there is Causation, there is Correlation. The action of the person pushing from the very back is of course correlated with the truck movement. But Correlation does NOT imply Causation – push by the guy on the truck bed has no causal effect on truck’s forward movement.

Therefore, Causation implies Correlation.

[Digression: A well-regarded book on Causation states that Causation does NOT imply Correlation; an extreme corner case of poor data selection is used as an example to make the author’s point – quite unnecessary and confusing . . .]

So Causation is Correlation PLUS something else! What is it?

Causation is Correlation PLUS something Else

Meinolf Sellmann has noted that “what we call "causal" relationships are actually also correlations that are just grounded in way more evidence” . . . TRUE! This “evidence” is the data generating process (DGP) that one has gleaned from observational data - Directed Acyclic Graph (DAG) which you provide as initial guess from expert knowledge or refined from data is the DGP ...

Let us not dismiss initial guess of DGP as a sophomoric approach! In my domain of application in Industrial IoT, Machinery experts have a great deal of insight into what part of a machine can cause issues to another part (or among connected equipment) from decades of experience listening, smelling and touching these machines. I use that information (= candidate DAG) to prime my Causal discovery from sensor data all the time. Of course, this knowledge may not be available in some other domains such as social sciences . . .

The holy grail is DGP – data generating process. From observed data, we are performing “inverse modeling” to estimate DGP structure and parameters when we do Causal Analysis.

The crux of Causal Discovery and Estimation is the inverse modeling process. It turns out that estimating the parameters of DGP from observed data is a highly ill-posed problem. Simple regularization based on norms, etc., won’t do to solve this ill-posed problem – the parameters so obtained may have no basis in reality.

Now things get interesting . . .

Unrelated to the philosophical or even Econometric Causality work in the long past, there was another estimation method brewing in the second half of the 20th century (at least according to my reading) finding applications in Telecommunications (starts in the 1950’s with a theorem of Bussgang), Audio Signal Processing, etc. The overall field can be called Blind Source Separation or BSS. BSS is a method to solve the so-called “cocktail-party effect”. My version of the history till now is available in a short note and hence I will not elaborate here.

The key idea behind BSS is to exploit statistical INDEPENDENCE in the data – going beyond Uncorrelated to Independent makes a usable distinction only if the data is NON-Gaussian. The data model of BSS and that of Structural Causal Model (SCM) are IDENTICAL.

The heavy machinery developed for BSS (a popular one is called ICA – Independent Component Analysis) has been brought to bear on Causal Analysis since 2006 (“A Linear Non-Gaussian Acyclic Model for Causal Discovery”) with excellent results especially for multivariate timeseries observations that fit the SCM model (with Non-Gaussian and Independent noise terms and a few other assumptions such as “Markov” and “causally sufficient”).

So as Engineers, we can think of Causal Analysis as an ill-posed inverse modeling problem whereby DGP is estimated from observed data; this is solvable if the “system noise” is constrained – constraints being non-Gaussian and independent.

As mentioned at the top of this section, in many practical applications (especially in Industrial IoT), domain expert-informed initial guess of DAGs will provide an additional constraint that will aid the convergence of ICA algorithm.

It is like anything else in Science – you have to bring in constraints to go from one level to the another, from Correlation to Causation. In that spirit, Correlation plus constraints identified above gives us Causation!

?

Why DAG?

Other than causality arguments, Directed Acyclic Graph is an awesome constraint. The corresponding Adjacency Matrix has a highly-exploitable form – TRIANULAR!


PS: You may be surprised that there is no mention here of “ladder of causation”, intervention, Do-Calculus, Judea Pearle, etc. - that is another major domain of Causality! In Industrial IoT, Intervention or A/B test is almost never possible on a functioning plant floor. However, within the framework described in this note, Counterfactual simulations using observed data alone can be performed to identify root-causes when something fails or “what-if” experiments can be performed to identify performance improvement options.

?

?

#causality #iiot #twinarc #ICA #BSS #DGP #SCM


Divya Atre

Building brand & demand through content marketing, social media marketing and campaigns

12 个月

It's great to see your passion for causality! The connection between prediction, optimization, and generation is indeed fascinating.

回复

Nice article ! This is my passion. Can you say a bit about the connection between prediction, optimization, maybe generation from a causality perspective ? Is the problem settings the same as optimal policy search where we try to find the best action for a given state to maximize the expected future cumulative reward ? Or we try to quantify the effect (to the future cumulative reward) of taking an action at a given state relative to taking other actions ?

要查看或添加评论,请登录

Dr. PG Madhavan的更多文章