A Deep Dive into ANOVA(part 2)
Vadim Tyuryaev
Data Scientist | PhD Candidate in Statistics | Executive MBA candidate | ML & AI Expert | Digital Innovation Advocate | International Educator |
In part 1 of the ANOVA series, our discussion encompassed the principles of one-way ANOVA, along with the implementation intricacies, logic, and specific details. The following part 2 discuses a two-way ANOVA with interaction which aims to extend the capabilities of one-way ANOVA, providing a more nuanced understanding of the sources of variability within a dataset. In this article, we will discuss underlying model, notation, derivations, and develop an R code from scratch.
Model
The model for two factor ANOVA with interactions can be written as:
Factors A and B are assumed to be fixed factors, i.e. independent variables with specific and predetermined levels of interest effects of which are primary of interest for the researchers. Errors are assumed to be normally distributed with mean zero and constant variance.
Schematic representation of the factorial experiment, assuming for simplicity that the number of repeated measures is the same and is equal to n, is presented below.
Formulas
The formulas for the sums of squares below utilize dot notation and are employed in ANOVA to compute either average effect at a specific level of one factor across all levels of another factor or cell average/overall average.
Assuming an equal number of repeated measures (n), the following formulas are being used in ANOVA:
Hypotheses
Note that we only need to calculate four sums out of five.
A number of hypotheses can be tested. For example:
Alternative hypotheses are:
领英推荐
Interaction
The interaction term in a two-way ANOVA reveals whether the influence of one independent variable on the dependent variable remains consistent across all levels of the other independent variable, and vice versa. When the interaction term is significant in a two-way ANOVA, it suggests that the combined effect of the two independent variables on the dependent variable is not additive. In other words, the impact of one variable on the dependent variable is influenced by the presence or level of the other variable. For instance, consider a specific medicine type that interacts with gender, leading to different effects on males compared to females. The recommended course of action in case of significant interaction includes running simple effects and post hoc analysis which will be discussed in part 3 of the ANOVA series.
Implementation
Note that the code provided below calculates so called Type 1 Sum of Squares. We will use the CO2: Carbon Dioxide Uptake in Grass Plants dataset to test our custom function.
# Two-way ANOVA function with interaction
two_way_anova <- function(data, response_col, factor1_col, factor2_col) {
# Extract data
response <- data[[response_col]]
factor1 <- data[[factor1_col]]
factor2 <- data[[factor2_col]]
# Unique levels of factors
levels_factor1 <- unique(factor1)
levels_factor2 <- unique(factor2)
# Calculate means
grand_mean <- mean(response)
means_factor1 <- tapply(response, factor1, mean)
means_factor2 <- tapply(response, factor2, mean)
# preallocate
means_interaction <- matrix(0, nrow = length(levels_factor1),
ncol = length(levels_factor2))
for (i in 1:length(levels_factor1)) {
for (j in 1:length(levels_factor2)) {
means_interaction[i, j] <- mean(response[factor1 ==levels_factor1[i] & factor2 == levels_factor2[j]])
}
}
# Calculate sums of squares
ss_total <- sum((response - grand_mean)^2)
ss_factor1 <- sum((means_factor1 - grand_mean)^2 * table(factor1))
ss_factor2 <- sum((means_factor2 - grand_mean)^2 * table(factor2))
# preallocate
ss_interaction_mat <- matrix(0, nrow = length(levels_factor1),
ncol = length(levels_factor2))
for (i in 1:length(levels_factor1)) {
for (j in 1:length(levels_factor2)) {
ss_interaction_mat[i, j] <- table(factor1,factor2)[i,j]*(means_interaction[i,j]-means_factor1[i]-means_factor2[j]+grand_mean)^2
}
}
ss_interaction <- sum(ss_interaction_mat)
# Note that SST = SSA+SSB+SSAB+SSE
ss_error <- ss_total - ss_factor1 - ss_factor2 - ss_interaction
# Calculate degrees of freedom
df_factor1 <- length(levels_factor1) - 1
df_factor2 <- length(levels_factor2) - 1
df_interaction <- df_factor1 * df_factor2
df_error <- length(response) - (length(levels_factor1) * length(levels_factor2))
# Calculate mean squares
ms_factor1 <- ss_factor1 / df_factor1
ms_factor2 <- ss_factor2 / df_factor2
ms_interaction <- ss_interaction / df_interaction
ms_error <- ss_error / df_error
# Calculate F-statistics
f_factor1 <- ms_factor1 / ms_error
f_factor2 <- ms_factor2 / ms_error
f_interaction <- ms_interaction / ms_error
# Calculate p-values
p_factor1 <- 1 - pf(f_factor1, df_factor1, df_error)
p_factor2 <- 1 - pf(f_factor2, df_factor2, df_error)
p_interaction <- 1 - pf(f_interaction, df_interaction, df_error)
# Create results data frame
results <- data.frame(
Factor = c(factor1_col, factor2_col, "Interaction", "Error"),
Df = c(df_factor1, df_factor2, df_interaction, df_error),
SumSq = c(ss_factor1, ss_factor2, ss_interaction, ss_error),
MeanSq = c(ms_factor1, ms_factor2, ms_interaction, ms_error),
Fvalue = c(f_factor1, f_factor2, f_interaction, NA),
Pval= c(p_factor1, p_factor2, p_interaction, NA)
)
return(results)
}
Test the custom function:
data(CO2)
result <- two_way_anova(CO2, "uptake", "Type", "Treatment")
print(result)
Custom results:
Compare to the results produced by built-in R function:
print(summary(aov(uptake~Type*Treatment, data=CO2)), digits = 8)
As you can see the results are identical.
Bonus
Try the following:
library("ggpubr")
ggline(CO2, x = "Treatment", y = "uptake", color = "Type",
add = c("mean_se", "dotplot"),
palette = c("red", "green"))
What do you observe?
Conclusion
In conclusion, the exploration of two-way analysis of variance (ANOVA) delves into additive and non-additive models, highlighting the intricate interplay between two categorical factors and a continuous response variable. Significant interaction effect reveals nuanced relationships where the joint impact of factors transcends individual effects. The consideration of various types of sums of squares, including Type I, Type II, and Type III, adds depth to the analysis, with the choice hinging on experimental design and research objectives. Developed from scratch R code provides in-depth details of mathematics and logic behind the two-way ANOVA with interaction model.
In part 3 of the ANOVA series we will discuss various post hoc procedures.
Stay tuned!