Understanding the difference between correlation and causation - of shark attacks and ice cream sales
Candela Iglesias Chiesa, MPH, PhD
Global Health Specialist ? Social Entrepreneur ? Advisor and Analyst ? Science Communicator ? Author and speaker
Authors: Candela Iglesias Chiesa, Joanne Veyret
The graph above shows that there are more shark attacks when more ice cream is sold, so to stop the attacks, let's stop eating ice cream.
Sounds preposterous? It is. It is also a very useful example of when correlation is not causation.
With so much #research and information being spread about #COVID19, we at @GHA thought it would be important to come back to the difference between #correlation and #causation. Incorrectly interpreting a correlation as a causal relationship is a common source of confusion and data misinterpretations.
As in the shark and ice-cream example, we humans naturally tend to interpret correlation as causation. That is, we tend to think that when two variables (for example ice-cream sales and shark attacks) change in relationship to each other (e.g. shark attacks increase when ice-cream sales increase), it is because one is causing the other (ice-cream eating is somehow causing the shark attacks).
Correlation is about how strongly a pair of values are related and how they change together over time (e.g. when one increases, the other also increases, or viceversa). But correlation doesn't tell you anything about the WHY or HOW of the relationship. It just expresses that a relationship exists. It could even be due to pure chance, and in many cases it is. (If you want to see some funny spurious (e.g. due to chance) correlations, check out this website.)
Causation takes an extra step in analysing the relationship and says that any change of one value will CAUSE a change in the value of the other (for example, a higher number of bathers will result in increased shark attacks). This means one value directly makes the other happen.
To prove a causal relationship, we need very well designed studies (such as randomized control trials or RCTs), and we need to check for the Bradford-Hill criteria (for example, is it plausible that one variable causes the other, is there a biological gradient, are the results reproducible, etc).
In the shark and ice-cream sales example, we are seeing a correlation, not a causal relationship (e.g. increase in ice-cream sales is associated with, but DOES NOT CAUSE increased shark attacks). It is possible that both increase at the same time because of a third variable, namely, increased number of bathers on the beaches due to summer weather.
So next time you see an article about #COVID19 out there and some condition or drug that seems to be associated with it, pause to think about whether there is enough data to prove causality or whether it is just shark attacks and ice-cream sales.
The best education in the world, for you :: Contributor to Forbes.com on international business education
4 年Yes to this, Candela Iglesias Chiesa, MPH, PhD.
Correct by Construction Advocate, Weird Machine Watchman
4 年Plenty more Spurious Correlations here: https://tylervigen.com/spurious-correlations
Postdoctoral Researcher, Journalist, Lecturer, Spanish Translator and Interpreter
4 年A flaw in quantitative Economics that led people like me to forget anything we knew about that so-called Science, and Maths, as early as 1995.
Manager, Risk and Capability Assessment at Public Health Agency of Canada | Agence de la santé publique du Canada
4 年Pirates and climate change. A famous example in environmental engineering ?? Also quite confused about the concept of inferred causation. I’m having a hard time wrapping my head around it...
A commentator on Japanese politics, law and history. Retired Board Director, Executive Officer at US/Japan Multinationals, & Int'l Business Attorney. Naturalized Japanese 2015 (Born Edward Neiheisel) A member of the LDP.
4 年Feeling a bit cynical today - so one could create a chart that the more testing you do the more actual deaths per capita you have - as the countries with the most deaths are the ones also doing the most testing