Model-based hypotheses testing, part 2: the 2-sample Wald’s z-statistic for proportions with pooled variances IS the Rao test of logistic regression

Model-based hypotheses testing, part 2: the 2-sample Wald’s z-statistic for proportions with pooled variances IS the Rao test of logistic regression

In my two previous articles: “Is logistic regression a regression? It has been a regression since its birth?—?and is used this way every day and "Logistic regression can replicate multiple parametric and non-parametric tests of proportions" I explained why the common misconception that “logistic regression is not a regression” is fundamentally flawed on the statistical ground, and showed several examples of model-based testing where the logistic regression allows one not only to replicate a variety of the classic statistical tests of proportions (%), but also to extend them in many ways.

Showing the direct relationships for all listed tests would be rather challenging and make this article extremely long. But some relationships are really cool! Previously I proved the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with non-pooled variances and the average marginal effect of the logistic regression with a single binary predictor distinguishing the compared samples.

Today I will prove, the equivalence between the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and Rao score test of logistic regression with a single binary predictor.

This will be a bit more difficult (46 formulas) than previously, so hold tight! The result is just beautiful! (at least to me ??).


The Wald’s z-statistic for difference in 2 proportions with pooled variances is of the following form:

where pA stands for the estimated probability (sample proportion, %) in the 1st group, pB is the estimated probability in the 2nd group, ?nA and nB?denote respective group sizes, and p is the pooled probability p = (xA + xB)/(nA + nB)

Traditionally the Wald’s statistic is expressed in the squared form, becoming: z^2~χ^2(df=1).

Both forms yield the same p-value. For convenience I will show that this χ^2(df=1) statistic is 1:1 equivalent to the Rao score test over the logistic regression with a single binary predictor playing role of indicator for the compared samples.

To simplify calculations, I will derive the formula for pooled probability p and the overall statistic form separately.

The equation of logistic regression

Let’s start from the equation of the logistic regression with a single binary predictor:

or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to X)

where Yi are independent Bernoulli random variables with probabilities pi. The second, simpler form with η=β0+Xβ1 will facilitate later calculations.

Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s simplify notation and express pA and pB in terms of beta coefficients:

Let’s also skip the hat notation for estimated p and use simply pA and pB until stated differently.

Pooled probability

Under H0, i.e. β1=0, pA = pB, and the best estimator of this common proportion is the pooled proportion pP. Let’s find its form:

We need to assume that the data consists of two independent binomial samples:

So the likelihood function is:

where xA and xB are the observed number of successes in groups A and B, respectively, and nA and nB are the total sample sizes in each group.

Under the null hypothesis H0: pA = pB = p (pooled) of some form - yet unknown.

Knowing that:

we obtain a single binomial likelihood where the total number of successes is xA+xB and the total number of trials is nA+nB, i.e.

Note: The binomial coefficients are multiplied rather than pooled, as they originally come from two independent samples, the true H0 is an assumption, not the property of the data. Actually, the choice doesn’t matter, as this term will be zeroed when taking the derivative.

Let’s simplify notation and replace p_pooled with just p. Now, the log-likelihood, log(L(p))=l(p), is defined as:

(I wrote const to highlight that this term will disappear after taking the derivative). Now by taking dl(p)/dp and setting it to 0, we obtain (provided that p != {0, 1}):

Or, alternatively, since pi=xi/ni

Log-likelihood of the logistic regression with a single binary predictor

The log-likelihood function is of the form:

where β is the vector of estimated coefficients, i.e. (β0, β1). Let’s express p and 1-p in terms of η=β0+Xβ1

Then

Gradient, Score, Hessian, Fisher Information, Covariance…

We will need both the gradient and Hessian of the log-likelihood. For future use, we will call the gradient as Rao score function denoted by “U” and the Hessian as “H”.

First, let’s find the form of U(β):

By noticing that dηi/dβ0=1, dηi/dβ1=xi, and remembering that p=exp(η)/(1+exp(η)?we obtain:

and

So finally:

Now, the Hessian:

The partial derivatives are as follows:

Therefore:

Let’s also determine the Fisher Information matrix:

where qi = 1-pi.

This can be further expanded, by substituting sums with appropriate (per respective group, A and B) counts of elements, remembering that n = nA + nB, A: X=0, B: X=1, pA=p(Y=1|X=A) and pB=p(Y=1|X=B), and:

So the final, useful form is:

This matrix will be used to find the covariance one:


Another way to obtain I(β):

You can obtain this matrix also in a different way. I described it my previous article https://www.dhirubhai.net/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/ Briefly:

where X is the design matrix with 2 columns (β0 of ones and β1 indicating when X=1), with nA and nB number of rows corresponding to group A and B, respectively.

Now, W is the diagonal matrix of weights, of the block-diagonal form:

which can be expressed simpler as:

where InA and InB are respective identity matrices.

The result of the matrix multiplication can be expressed as appropriate sums:

where 1 is the result of multiplying 1 x 1 (the β0 vector), and X refer to the other products of the β0 and β1 vectors. Notice, that this is exactly the matrix #24.


The Rao Score and Information Matrix under H0: β1=0

The Rao Score

Recall:

Again, nA and nB depict the respective number of rows corresponding to group A and B, for group A: X=0 and for B: X =1 and n=nA+nB. Also, under H0, pA = pB = p (i.e. ), where p is the pooled probability:

Here, yi is the response vector containing 1s in respective group. Summing all those 1s yields the total number of successes in this group, which can be expressed as:

And finally:

Information Matrix

By recalling matrix #26 and remembering that under H0, pA=pB=p we obtain:

Let’s also calculate I^(-1):

After simplifying the denominator term:

we finally obtain:

Rao score test under H0: β1=0

The Rao score test (called also Lagrange multiplier test) under H0 is defined as the following quadratic form:

Let’s do the multiplication.

First the A=I^-1*U part:

Partial results:

and

Therefore:

And finally, R=U'A

This way I have proven the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and the Rao score test over the logistic regression with a single binary predictor distinguishing compared samples.

Phew! I felt like I was back in college! Only this time I liked the challenge ??

And, again, some comparisons in R:

> wald_z_test_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20)
  diff         z     chi2        se   p.value p.value_1        LCI       HCI
1 -0.2 -1.290994 1.666667 0.1549193 0.1967056 0.0983528 -0.5036363 0.1036363

> prop.test(c(6,10), c(20,20), correct = FALSE)
	2-sample test for equality of proportions without continuity correction

data:  c(6, 10) out of c(20, 20)
X-squared = 1.6667, df = 1, p-value = 0.1967
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.49724326  0.09724326
sample estimates:
prop 1 prop 2 
   0.3    0.5 

> data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6),
+                                        rep("Success", 10), rep("Failure", 20-10))),
+                    grp    = factor(rep(c("B", "A"), each=20)))

> m <- glm(response ~ grp, data = data, family = binomial(link = "logit"))
> anova(m, test = "Rao")
Analysis of Deviance Table

Model: binomial, link: logit
Response: response
Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev    Rao Pr(>Chi)
NULL                    39     53.841                
grp   1   1.6805        38     52.160 1.6667   0.1967        


The implementation of the z statistic:

wald_z_test_pooled <- function(x1, n1, x2, n2, conf.level=0.95) {
      p1 <- x1/n1
  p2 <- x2/n2
  
  p_pool <- (p1*n1 + p2*n2) / (n1+n2)
  se_p1 <- sqrt(p_pool * (1 - p_pool) / n1)
  se_p2 <- sqrt(p_pool * (1 - p_pool) / n2)
  
  se_diff <- sqrt(se_p1^2 + se_p2^2)
  
  z <- (p1 - p2) / se_diff
  p <- 2 * (1 - pnorm(abs(z)))
  hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff
  
  return(data.frame(diff=p1-p2, 
                    z = z, 
                    chi2 = z^2,
                    se = se_diff, p.value = p, p.value_1 =p/2,
                    LCI = (p1-p2) - hCI,
                    HCI = (p1-p2) + hCI,
                    row.names = NULL))
}        

If these formulas render poorly on your screen, find it on my GitHub: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_Rao_Wald_z_test_proportions.md

In the next articles I will try to prove other relationships, so stay tuned!

Subhash Chandra

Statewide Leader and Chief Biometrician, Agriculture Victoria Research, Agriculture Victoria

2 天前

Love this

回复
Arindam Pal

Director, Clinical Development

3 天前

Loved the 3rd sentence??

回复
Adrian Olszewski

Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI/Big data) ? Against anti-car/-meat/-cash and C40 restrictions

3 天前

Who wants to have some fun on Friday evening? Let's do some math! ?? Darko Medin, Justin Bélair - I know you will understand my dark sense of Friday evening ?? ?? Ming "Tommy" Tang, Jonas Kristoffer Lindel?v?? you both inspired me to research the field of model-based testing, now outside the general linear model. Piotr Szulc, Andrzej Por?bski - nie twierdz?, ?e si? nudzicie w pi?tkowy wiosenny wieczór, ale.... ale mo?e Wam si? przyda do poduszki ?? Andrzeju - przypomnia?em sobie Twój post dot. ANOVAy i modeli, niebawem wróc? do niego! Bo to si? spotka po drodze z model-based testing. Tylko musia?em uporz?dkowa? ba?agan my?li i koncepcji. PS: here is the 2nd part: https://www.dhirubhai.net/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/?trackingId=KyjV3mBaSruiQnWt1kDAlQ%3D%3D

Gabriel Ryan, FRM

VP at DBS Bank (SG) - Risk & Data |

3 天前

All your Friday nights are refreshing ??. Vs some blanket posts saying that hypothesis testing is useless from certain communities.

要查看或添加评论,请登录

Adrian Olszewski的更多文章

社区洞察

其他会员也浏览了