Reflections on The Book of Why
I recently finished reading "The Book of Why" by Judea Pearl, a Turing Award winner. The book provides a history of causal reasoning and a comprehensive guide to discovering causal relationships in data. In this article, I will share my lessons learned from the book (and the ones I did not!), but before I do that, I will zoom out and set the scene first.
The New Archeology
We live in a world of data, a stark contrast to the past where the traces of our behavior might have been lost to time, only to be unearthed centuries later by archaeologists striving to understand our (past) behaviors. Today, however, our digital footprints are immediate and permanently accessible. This shift towards digital permanence in the information age not only poses challenges e.g. in terms of privacy but also opens up unprecedented opportunities for knowledge discovery. It allows us to analyze, learn from, and leverage the traces of our lives—be it in the digital products we interact with or the knowledge we encapsulate in digital (or digitizable) records.
Navigating this world requires a profound enhancement of our data literacy, not just as individuals or professionals but across industries and as humanity at large. While the distant past relied on learning predominantly through human observation, imitation, and reasoning, the modern era introduces us to new means of transforming digital data into knowledge. The evolution of machine learning and more recently the rise of large language models exemplifies our advancing tools, offering an increasingly sophisticated array of techniques to interpret our continuously expanding digital artifacts.
Causality
A longstanding intrigue for many, including myself (albeit not as a scientist but as a software product manager), has been deciphering the complex relationship between cause and effect from digital traces. Understanding causality is considered a key ingredient of artificial (general) intelligence as it would offer insights that predictive or generative AI models could not do - breaking through of the stochastic ceiling of the training data.
Applied to product management, this would give answers to questions like "What was the cause for this new user behavior, was it the feature we released, or was it the marketing campaign or something else?"
Why Causality?
One may wonder, why this question is even relevant if the outcome is favorable. Understanding causality in any domain turns guesswork into precise measures, improving both, the efficiency and effectiveness of the work of anybody who aims to solve a problem. In physics, for instance, understanding the causal relationship between force and motion allows engineers to design more efficient and safer structures and machines, by precisely predicting how they will behave under various conditions. Similarly, in product development, comprehending the causal links between product updates, marketing activity, and user engagement can lead to more targeted and successful product improvements. This approach not only minimizes the wastage of resources but also ensures that efforts are directed towards interventions that are most likely to yield positive outcomes.
But isn't correlation all we need? If we know that conditions a and b (e.g. feature update, marketing campaign) need to be satisfied to get the desired outcome (e.g. user activity) - this can be answered by traditional statistics alone - why bother about causal relationships then? To use Judea Pearl's words: Isn't "curve fitting" (i.e. finding patterns for a and b in the data) all we need? According to Judea, it is not. Curve fitting will never give us answers to questions like "What if I do ..." or "What if I had done differently....", which are very much the type of questions humans ask (intuitively or consciously) all the time. Only a thorough understanding of cause and effect links can tell us about the deterministic mechanics of our actual or hypothetical actions (i.e. interventions and counterfactuals). That is, to learn from things we did not do (I'll come back to this below).
Is it Even Possible?
Traditionally, unveiling the connection between cause and effect necessitated experiments (i.e. to do things) designed to reveal those relationships (e.g. A/B tests). But what if all we have is historical data about associations that were not generated in an experiment conducted with the intention to reveal causal effects? Karl Pearson, who allegedly coined the mantra “correlation does not imply causation” in “The Grammar of Science,” articulates a skepticism that still echoes today
“Science in no case can demonstrate any inherent necessity in a sequence, nor prove with absolute certainty that it must be repeated.”?
So, put differently, we can observe the sequence of events and calculate their correlation coefficients (read: find the variables that fit the curve), but not unveil the “inherent necessity”, which would be a cause-effect relationship between the events.
Can We Learn From Nature?
Many discoveries in science were made by imitating nature. So, reflecting on how children understand causality could provide fascinating parallels to this discussion. Research in child cognitive development has proposed two primary theories. One theory posits that children understand causality through association, where stronger correlations lead them to infer a cause-and-effect relationship (which is somewhat curve-fitting). The other suggests that children can construct a causal model of their environment even in the absence of direct observational data or interaction, indicating a potential innate ability to perceive beyond mere observed correlations and experimentation. This insight into human cognition not only enriches our understanding but also prompts us to consider the potential (as well as challenge) for machines to replicate such intuitive leaps in understanding causality.
The Book
The pursuit of comprehending causality solely from historical data, without direct experimentation, hints at a fundamental aspect of human cognition. This prompts an intriguing question: Could we transform the processes of hypothesis generation and testing—the creation of a causal model—into a computational problem, enabling machines to discern causality from the digital imprints we leave?
Prompted by these reflections and the emergence of ‘causal AI’, I delved into “The Book of Why”, seeking insights into these very questions. Judea Pearl, renowned for his pioneering work in artificial intelligence and causal inference, offers a framework for understanding causality that goes beyond traditional statistical methods. His book presents the “Causal Ladder,” a conceptual framework for understanding different levels of causal inference.
领英推荐
Admittedly, climbing this "Causal Ladder" was not without its challenges (not the easiest read).?
The book is dense with insights that require time to digest and fully appreciate if you don't happen to have a statistics or data science background. However, the journey was intellectually rewarding for me, as it offered a deeper understanding of the intricate relationship between data and causality.
“[...] causal questions can never be answered from data alone"
This quote is probably the most important statement in the book. Pearl argues that causal inquiries cannot rely on data alone, cautioning against the prevalent overemphasis on Big Data in nascent fields. His method advocates for creating a "model of the world" through causal diagrams that include assumptions about the data generation process, followed by analyzing the observational data to validate and refine this model.
He remains vague about how this model is developed and refers to e.g. “scientific knowledge”.
This approach, mirroring the scientific method of hypothesis creation and testing, emphasizes the indispensability of human intuition and expertise alongside algorithms.
Admittedly, I was disappointed by the book as it does not show a course to derive causal models from historical data alone and offers a rather balanced view of AI's capabilities and limitations to understanding causality. But, it very much refined my perspective on the matter. My journey highlighted two significant “aha moments” which I am sharing below.
Aha Moment #1: No(t always) Need to Experiment
On the second rung of the causal ladder (see picture above), addressing interventional questions ("What if I do X?"), Pearl introduces the concept of "do calculus". Given the right conditions in the causal model (however that was discovered) and the available historical data at hand, it is possible to answer these questions without actually conducting experiments. This is indeed a powerful technique as it offers a fresh perspective on already available historical data, enabling us to infer the effects of hypothetical interventions and thus avoid the costs of experimentation to find cause-and-effect by using “do calculus” instead of “doing”.
This approach is particularly valuable in product management for several reasons: Where economic or time constraints may preclude experimentation, establishing a causal diagram with a - ideally - minimal set of assumptions allows us to estimate effects by leveraging existing data. This can save the direct costs of experimentation and the cost of delay in getting the answers from experiments. Furthermore, the approach requires the precise articulation (or I should better say “formulation”) of key assumptions in a diagram, which makes them transparent and thus debatable. Thirdly, the calculated (quantified!) effects from the hypothetical interventions can inform if the debate is even worth it. Thus, fostering more transparent and productive decision-making about “What if we do X?”.
Aha Moment #2: No Need to Travel Through Time
On the third rung of the ladder, addressing counterfactual queries ("Given we did X and we saw the outcome Y, what would have happened if we did X’ instead?"), Pearl demonstrates with his counterfactual analysis how, given a causal diagram, one can explore these questions using available historical data only. This method opens avenues for integrating data science into retrospectives in product management, offering insights into alternative scenarios that might have resulted from different decisions. Leveraging purely historical, observational data from user behaviors, initially gathered for other purposes, can unveil insights into questions such as "What would the impact on metric Y have been if we hadn't released feature X?" (or vice versa). The realization here is that despite our inability to rewind time or access a parallel universe to observe the outcomes of different decisions (if you are thinking of the movie Everything Everywhere Right Now - you are not alone), through counterfactual analysis, the answers can, under certain conditions of the causal model, be unearthed within the existing data. These answers can inform future decisions. This is fascinating indeed.
Both methods, "do calculus" and counterfactual analysis, show the huge potential of product telemetry beyond A/B testing. Given the right data set, we can answer questions we did not have in the first place.
Summary
So, coming to an end, while "The Book of Why" has broadened my understanding of data's potential beyond mere correlation, it leaves the process of discovering the initial causal model somewhat ambiguous—which was the question that led me to the book in the first place.
This only underscores the inherent challenges in discovering the initial causal models purely from observation. This endeavor, especially in complex (i.e. not complicated!) domains, presents a significant intellectual challenge that pushes the boundaries of our current methodologies. However, as active research in causal AI and inference continues to evolve, I remain optimistic about our capacity to uncover deeper truths from the digital imprints of our existence. Pearl's work refined my perspective on data's potential and reignited my commitment to exploring causality and AI and its potential application in the product management domain.