Lord's paradox in R
Andrés Gutiérrez
ECLAC Regional Adviser on Social Statistics - Vicepresident of the International Association of Survey Statisticians (2023 - 2025) - Elected Member of the International Statistical Institute
In an article entitled "A Paradox in the Interpretation of Group Comparisons" published in Psychological Bulletin, Lord (1967) made famous the following controversial story:
A university conducted a study to investigate the impact of the nutritional diet provided at their campus restaurant on student weight. Two statisticians analyzed the data collected, comparing the weights of male and female students in January and June. The first statistician observed that, on average, both men and women maintained the same weight throughout the semester, with women starting at a lower average weight. He concluded that there was no significant evidence of the diet or any other factor affecting student weight, and no differential effect on genders.
However, the second statistician delved deeper into the data and discovered a subgroup of thin men and overweight women who started the semester with the same weight. He found that these men gained more weight on average, while these women lost more weight relative to the average. By controlling for the initial weight, the second statistician concluded that the university diet had a positive differential effect on men compared to women.
This suggests that for individuals with the same initial weight, men tended to gain more weight while women tended to lose weight when exposed to the campus diet.
The following chart illustrates the reasoning of both statisticians when addressing the problem. Please note that the black line represents a 45-degree line, the green points represent the data from men, and the red points represent the data from women.
The reasoning of the first statistician focuses on the expectations of both distributions, particularly at the coordinates (x = 60, y = 60) for females and (x = 70, y = 70) for males, where the black, red, and green lines appear to coincide. The second statistician's reasoning is limited to the continuum created by the overlap of the red and green dots, specifically in the space defined by x = (60, 70) and y = (60, 70). Let's suppose we have access to this dataset, as shown in the following illustration, where the first column represents the initial weight of the students, the second column indicates the final weight, the third column describes the weight difference, and the last column indicates the gender of the student.
The findings of the first statistician are derived from a simple regression analysis. By considering the weight difference as the response variable, the analysis reveals a not significant regression coefficient for the gender variable. This indicates that there are no significant differences in the weight change between men and women.
The findings of the second statistician are obtained through a covariance analysis; where the response variable is the final weight, and the covariates considered are gender and the initial weight of the individuals. Through this method, a regression coefficient of 5.98 is found, which implies a significant difference between the final weight of people based on their gender.
领英推荐
According to Imbens and Rubin (2015), both statisticians are correct in describing the data, but they lack a strong reasoning to establish causality between the university diet and the weight changes in students. However, despite this limitation, I find the analysis that focuses on the comparison between men and women who started with the same weight (restricted to x = (60, 70) and y = (60, 70)) more interesting.
R workshop
Lord's paradox is a phenomenon that arises from the analysis of average weight conducted by two statisticians in a university setting. At the end of the semester (June), the average weight of male students remains the same as their initial weight in January. This pattern is also observed among female students. The only distinction is that women started the year with a lower average weight, which is attributed to their natural body structure. On average, neither men nor women experienced any significant weight gain or loss during the semester.
During the simulation, it was assumed that there is a linear relationship between the final weight and the initial weight for both men and women. For women, this relationship can be described as follows: the final weight is equal to the intercept plus the product of the initial weight and the regression coefficient. Similarly, for men, the relationship can be expressed with the same rationale.
Taking into account their natural body structures, men are expected to have a higher average weight than women. Let's assume that, on average, the weight of men is equal to the weight of women plus a constant value, and that the mean weight within each group remains the same over time. From this, we can deduce that:
After some algebra, it is found that:
The following code replicates a dataset that follows the relationship proposed by Lord.
N <- 100
b <- 10
l <- 50
u <- 70
Woman1 <- runif(N, l, u)
Man1 <- Woman1 + b
beta1 <- 0.4
Womanb0 <- (1 - beta1) * mean(Woman1)
Manb0 <- mean(Man1) - beta1 * (mean(Woman1) + b)
sds <- 1
Woman2 <- Womanb0 + beta1 * Woman1 + rnorm(N, sd = sds)
Man2 <- Manb0 + beta1 * Man1 + rnorm(N, sd = sds)0
The graph can be done as follows:
library(ggplot2)
ggplot(data = data, aes(start, end, color = factor(gender))) +
? geom_point() + stat_smooth(method = "lm") +
? geom_abline(intercept = 0, slope = 1) +
? ggtitle("Lord's Paradox") + theme_bw()