Model-based hypotheses testing, part 2: the 2-sample Wald’s z-statistic for proportions with pooled variances IS the Rao test of logistic regression
Adrian Olszewski
Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI/Big data) ? Against anti-car/-meat/-cash and C40 restrictions
In my two previous articles: “Is logistic regression a regression? It has been a regression since its birth?—?and is used this way every day” and "Logistic regression can replicate multiple parametric and non-parametric tests of proportions" I explained why the common misconception that “logistic regression is not a regression” is fundamentally flawed on the statistical ground, and showed several examples of model-based testing where the logistic regression allows one not only to replicate a variety of the classic statistical tests of proportions (%), but also to extend them in many ways.
Showing the direct relationships for all listed tests would be rather challenging and make this article extremely long. But some relationships are really cool! Previously I proved the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with non-pooled variances and the average marginal effect of the logistic regression with a single binary predictor distinguishing the compared samples.
Today I will prove, the equivalence between the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and Rao score test of logistic regression with a single binary predictor.
This will be a bit more difficult (46 formulas) than previously, so hold tight! The result is just beautiful! (at least to me ??).
The Wald’s z-statistic for difference in 2 proportions with pooled variances is of the following form:
where pA stands for the estimated probability (sample proportion, %) in the 1st group, pB is the estimated probability in the 2nd group, ?nA and nB?denote respective group sizes, and p is the pooled probability p = (xA + xB)/(nA + nB)
Traditionally the Wald’s statistic is expressed in the squared form, becoming: z^2~χ^2(df=1).
Both forms yield the same p-value. For convenience I will show that this χ^2(df=1) statistic is 1:1 equivalent to the Rao score test over the logistic regression with a single binary predictor playing role of indicator for the compared samples.
To simplify calculations, I will derive the formula for pooled probability p and the overall statistic form separately.
The equation of logistic regression
Let’s start from the equation of the logistic regression with a single binary predictor:
or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to X)
where Yi are independent Bernoulli random variables with probabilities pi. The second, simpler form with η=β0+Xβ1 will facilitate later calculations.
Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s simplify notation and express pA and pB in terms of beta coefficients:
Let’s also skip the hat notation for estimated p and use simply pA and pB until stated differently.
Pooled probability
Under H0, i.e. β1=0, pA = pB, and the best estimator of this common proportion is the pooled proportion pP. Let’s find its form:
We need to assume that the data consists of two independent binomial samples:
So the likelihood function is:
where xA and xB are the observed number of successes in groups A and B, respectively, and nA and nB are the total sample sizes in each group.
Under the null hypothesis H0: pA = pB = p (pooled) of some form - yet unknown.
Knowing that:
we obtain a single binomial likelihood where the total number of successes is xA+xB and the total number of trials is nA+nB, i.e.
Note: The binomial coefficients are multiplied rather than pooled, as they originally come from two independent samples, the true H0 is an assumption, not the property of the data. Actually, the choice doesn’t matter, as this term will be zeroed when taking the derivative.
Let’s simplify notation and replace p_pooled with just p. Now, the log-likelihood, log(L(p))=l(p), is defined as:
(I wrote const to highlight that this term will disappear after taking the derivative). Now by taking dl(p)/dp and setting it to 0, we obtain (provided that p != {0, 1}):
Or, alternatively, since pi=xi/ni
Log-likelihood of the logistic regression with a single binary predictor
The log-likelihood function is of the form:
where β is the vector of estimated coefficients, i.e. (β0, β1). Let’s express p and 1-p in terms of η=β0+Xβ1
Then
Gradient, Score, Hessian, Fisher Information, Covariance…
We will need both the gradient and Hessian of the log-likelihood. For future use, we will call the gradient as Rao score function denoted by “U” and the Hessian as “H”.
First, let’s find the form of U(β):
By noticing that dηi/dβ0=1, dηi/dβ1=xi, and remembering that p=exp(η)/(1+exp(η)?we obtain:
and
So finally:
Now, the Hessian:
The partial derivatives are as follows:
Therefore:
Let’s also determine the Fisher Information matrix:
领英推荐
where qi = 1-pi.
This can be further expanded, by substituting sums with appropriate (per respective group, A and B) counts of elements, remembering that n = nA + nB, A: X=0, B: X=1, pA=p(Y=1|X=A) and pB=p(Y=1|X=B), and:
So the final, useful form is:
This matrix will be used to find the covariance one:
Another way to obtain I(β):
You can obtain this matrix also in a different way. I described it my previous article https://www.dhirubhai.net/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/ Briefly:
where X is the design matrix with 2 columns (β0 of ones and β1 indicating when X=1), with nA and nB number of rows corresponding to group A and B, respectively.
Now, W is the diagonal matrix of weights, of the block-diagonal form:
which can be expressed simpler as:
where InA and InB are respective identity matrices.
The result of the matrix multiplication can be expressed as appropriate sums:
where 1 is the result of multiplying 1 x 1 (the β0 vector), and X refer to the other products of the β0 and β1 vectors. Notice, that this is exactly the matrix #24.
The Rao Score and Information Matrix under H0: β1=0
The Rao Score
Recall:
Again, nA and nB depict the respective number of rows corresponding to group A and B, for group A: X=0 and for B: X =1 and n=nA+nB. Also, under H0, pA = pB = p (i.e. ), where p is the pooled probability:
Here, yi is the response vector containing 1s in respective group. Summing all those 1s yields the total number of successes in this group, which can be expressed as:
And finally:
Information Matrix
By recalling matrix #26 and remembering that under H0, pA=pB=p we obtain:
Let’s also calculate I^(-1):
After simplifying the denominator term:
we finally obtain:
Rao score test under H0: β1=0
The Rao score test (called also Lagrange multiplier test) under H0 is defined as the following quadratic form:
Let’s do the multiplication.
First the A=I^-1*U part:
Partial results:
and
Therefore:
And finally, R=U'A
This way I have proven the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and the Rao score test over the logistic regression with a single binary predictor distinguishing compared samples.
Phew! I felt like I was back in college! Only this time I liked the challenge ??
And, again, some comparisons in R:
> wald_z_test_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20)
diff z chi2 se p.value p.value_1 LCI HCI
1 -0.2 -1.290994 1.666667 0.1549193 0.1967056 0.0983528 -0.5036363 0.1036363
> prop.test(c(6,10), c(20,20), correct = FALSE)
2-sample test for equality of proportions without continuity correction
data: c(6, 10) out of c(20, 20)
X-squared = 1.6667, df = 1, p-value = 0.1967
alternative hypothesis: two.sided
95 percent confidence interval:
-0.49724326 0.09724326
sample estimates:
prop 1 prop 2
0.3 0.5
> data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6),
+ rep("Success", 10), rep("Failure", 20-10))),
+ grp = factor(rep(c("B", "A"), each=20)))
> m <- glm(response ~ grp, data = data, family = binomial(link = "logit"))
> anova(m, test = "Rao")
Analysis of Deviance Table
Model: binomial, link: logit
Response: response
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
NULL 39 53.841
grp 1 1.6805 38 52.160 1.6667 0.1967
The implementation of the z statistic:
wald_z_test_pooled <- function(x1, n1, x2, n2, conf.level=0.95) {
p1 <- x1/n1
p2 <- x2/n2
p_pool <- (p1*n1 + p2*n2) / (n1+n2)
se_p1 <- sqrt(p_pool * (1 - p_pool) / n1)
se_p2 <- sqrt(p_pool * (1 - p_pool) / n2)
se_diff <- sqrt(se_p1^2 + se_p2^2)
z <- (p1 - p2) / se_diff
p <- 2 * (1 - pnorm(abs(z)))
hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff
return(data.frame(diff=p1-p2,
z = z,
chi2 = z^2,
se = se_diff, p.value = p, p.value_1 =p/2,
LCI = (p1-p2) - hCI,
HCI = (p1-p2) + hCI,
row.names = NULL))
}
If these formulas render poorly on your screen, find it on my GitHub: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_Rao_Wald_z_test_proportions.md
In the next articles I will try to prove other relationships, so stay tuned!
Statewide Leader and Chief Biometrician, Agriculture Victoria Research, Agriculture Victoria
2 天前Love this
Director, Clinical Development
3 天前Loved the 3rd sentence??
Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI/Big data) ? Against anti-car/-meat/-cash and C40 restrictions
3 天前Who wants to have some fun on Friday evening? Let's do some math! ?? Darko Medin, Justin Bélair - I know you will understand my dark sense of Friday evening ?? ?? Ming "Tommy" Tang, Jonas Kristoffer Lindel?v?? you both inspired me to research the field of model-based testing, now outside the general linear model. Piotr Szulc, Andrzej Por?bski - nie twierdz?, ?e si? nudzicie w pi?tkowy wiosenny wieczór, ale.... ale mo?e Wam si? przyda do poduszki ?? Andrzeju - przypomnia?em sobie Twój post dot. ANOVAy i modeli, niebawem wróc? do niego! Bo to si? spotka po drodze z model-based testing. Tylko musia?em uporz?dkowa? ba?agan my?li i koncepcji. PS: here is the 2nd part: https://www.dhirubhai.net/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/?trackingId=KyjV3mBaSruiQnWt1kDAlQ%3D%3D
VP at DBS Bank (SG) - Risk & Data |
3 天前All your Friday nights are refreshing ??. Vs some blanket posts saying that hypothesis testing is useless from certain communities.