Data Science Series- 1. ANOVA for Business Decisions
Amit Jaiswal
McKinsey & Company I Finance Digital Transformation | Fintech Product Management |
Introduction
These are tough times for us, especially in the wake of a pandemic where most of us are impacted, directly or indirectly. I have shelved all negativity and decided to shift my focus on learning and sharing some key concepts which can help professionals from various functional domains. These can surely benefit the ones looking to explore and add “Data Science” as new-age skills into their profiles.
In this article, I will pick up the concept of Analysis of Variance (ANOVA) and will try to explain it in simple business terms, without going deep into formulas and technicalities. As for any functional leaders’ or business managers’ understanding, the concept and its relevance in solving business problems is more important than breaking their head on mathematical models. The best part is that you can perform ANOVA analysis both in excel or any advanced statistical programming language like R etc.
Business Scenarios best suited for ANOVA
ANOVA is widely used across businesses and industries for a variety of purposes and is a technique that enables companies to identify problems, trends, risks, and opportunities that impact both short and long-term viability. Listed below are a few of its many applications within business scenarios:
Many a times, an investigator would like to compare more than two means in a problem scenario. For example, a business manager might be interested in hiring a sales executive for a critical region that is underperforming.
The Hiring manager would be interested in comparing the aptitude test scores from a test designed for sales executives coming from different backgrounds.
Manufacturing companies can use ANOVA when purchasing materials to compare the quality of the material to the costs so that they know which supplier to buy from.
Cosmetics companies can use ANOVA to test the safety and effectiveness of certain makeup or sunscreen products. They can evaluate these products across different groups of people and then choose to use the ingredients that provide the desired outcomes, while minimizing health risks.
ANOVA is extremely popular in entertainment and media, as they leverage it to test different locations for filming an upcoming movie and determine the best suited site based on the time of the year, the material cost required for building a set, etc..
Understanding about ANOVA
Analysis of variance, as known by acronym ANOVA, are methods that allow you to compare multiple populations. Or groups. In ANOVA, you take samples from each group to examine the effects of differences among two or more groups. The criteria that distinguish the groups are called factors and sometimes are called factors of interest.
There are two elements of ANOVA:
· Variation within each group
· Variation between groups
When comparing two groups, t-test is preferred over ANOVA. However, when we have more than two groups, the t-test is not the optimal choice because separate t-tests are required to perform comparisons in each pair, thereby making it an inefficient approach to solve the business problem.
These tests start by creating a null hypothesis (Ho), which states that there is no significant difference between the variables being measured. If the test yields statistically significant results, then the tester can reject the null hypothesis, and accept the alternative hypothesis (H1), stating that the interaction between variables is significant.
Building the Hypothesis
We set the null and alternative hypothesis as below:
· Null hypothesis (H0): The average weights of two groups are not different.
· Alternative hypothesis (H1): The average weights of the two groups are different.
The F statistic
F statistic is the ANOVA coefficient which tells us whether the results are significant.
An F value around 1 denotes little to no difference in values; meaning there is not a significant variance between the groups.
The formula to find the F statistic is taking the mean squared error of the data set and dividing it by the mean sum of squares of the data set.
I recommend using a software system, as this can all be done in seconds.
Conclusion
ANOVA is a method to determine if the mean of groups is different. In inferential statistics, we use samples to infer properties of populations. Statistical tests like ANOVA help us justify if sample results are applicable to populations.
ANOVA can also be used in feature selection process of machine learning. The features can be compared by performing an ANOVA test and similar ones can be eliminated from the feature set.
ANOVA is also related to regression. Because of ANOVA’s relationship with both hypothesis testing and regression, understanding the concepts of ANOVA can be important in understanding the methods for advance concepts related to Machine Learning algorithm.
Director - Finance at EY
3 年Very nice Amit..