More I Study, More India Wins Cricket Matches: Correlation or Causality?

Even though correlation and causality are two basic and most common topics in Statistics, people still tend to confuse between the two and use them interchangeably. Let’s understand these two topics with an example and small datasets.

According to the Australian Bureau of Statistics, “Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables.” Does that mean one variable is causing the change in the other variable? The answer is NO. It’s a mere ‘correlation’ which shows that the two variables move with each other in the same direction. In other words, correlation can be defined as how strongly two variables move with each other.

Then, how do we identify which variable is causing the effect on another variable? For that, we have another phenomenon called causality. Causality implies that one variable causes the other variable. The Australian Bureau of Statistics defines causation as “Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect.”

Let’s understand the two concepts with two small datasets.

More I Study, More India Wins Cricket Matches: Correlation or Causality? - Dataset

Dataset 1 compares the number of hours practised by Indian cricket team vs the number of matches won by Indian cricket team; while, the dataset 2 compares the number of hours studied by me vs the number of matches won.

Both the datasets have a high correlation, 0.85 and 0.78, respectively. Now, let’s relate our correlation and causation definition with the examples at hand. Correlation of 0.85 between hours practised and matches won follow a strong linear trend, but does that also mean that the number of hours practised is the reason for matches won? Yes? No? Maybe?

If we were to think intuitively, it makes sense that more the practice, more the wins. In the real world, it may or may not be true. Can we say the same in our case based on correlation? If the answer is Yes, then it should be true for the second dataset as well? Since the correlation in the second dataset is also high, does it imply that the number of hours I study is the reason for India’s wins?

The answer to both the scenarios is NO. Even though the practice may be one of the causes of wins, but we cannot say the same based on correlation. There may be other factors such as weather, toss, pitch, etc. that may be causing India to win. To establish a causal relationship, we need to conduct experiments where we keep all other factors constant and play with one variable at hand (in this case - # hours practice).

So, coming back to our initial thought: are correlation and causality the same? NO, correlation is merely an indication of how two variables (related or unrelated) move with respect to each other; while, causality is a phenomenon where one variable results in the occurrence of the second variable.










要查看或添加评论,请登录

Vishal Bagla的更多文章

社区洞察

其他会员也浏览了