Replication crisis
ANCOVA applied to the 2000 original values but wrongly ignoring the crucial data structure of students within halls

Replication crisis

The story so far

In a previous blog, Halls of Fame, I explained how the original data for a dietary experiment involving 20 halls with 100 students per hall were lost. Summary figures at the start of the academic year (baseline) and at the end (outcome) were available. Using these figures of means only, veteran curmudgeon Guernsey McPearson was able to produce an analysis as follows:

No alt text provided for this image


Then, miraculously, the original individual data per student were discovered. What happened next? I shall now reveal all.

Drowning in data

The Newtrition research team wasted no time in getting to work on the newly discoverd treasure-trove of original data. A meeting with Guernsey McPearson (GMcP) was requested by a representative (Rep) of the team and went something like this.

Rep Great news, Guernsey. You know that "trend towards non-significance" you were talking about. That is now a whoppingly significant result and not the puny P=0.023 you found. In fact, when we used the original data, not only was P<0.001 it was completely off the scale.

GMcP I can imagine. How did you handle the "hall" effect?

Rep Well we did puzzle about that. We put "hall" in as a factor but it failed to compute and gave us some strange messages, so we took it out again. Et voila.

GMcP Ah yes. You had hall as a fixed effect. You should, of course, have had it as random effect. Handling the initial weight as a covariate would, however, have been tricky. Basically you should just have stuck with the summary measures approach. Handled correctly, the individual student data would have given you the same answer.

Rep What? How can that be.

GMcP Give me the original data and I shall show you.

When more is just the same

GMcP was as good as his word. He returned with the following analysis* of the original 2000 values and not just the 20 means per hall.

No alt text provided for this image

GMcP You see, as I predicted, the result is just the same.

Rep But how can this be? You only analysed the 20 summary statistics before. Now you have analysed the 2000 values and you get the same answer? Looking at the table the students don't seem to be doing anything useful when it comes to judging the diet effect.

GMcP The data are hierarchical. There are two variances, between-hall and within-hall and two covariances between and within. However, you varied the diet between halls...

Rep That was a practical necessity!

GMcP I don't dispute it. I think that you were right to do so. But Nature is unsympathetic and neither a respector of motives nor of practical difficulties. Since diet varies between halls, the between-hall variance was relevant to judging the effect of diet and the between-hall covariance also. You have to watch out for the dangers of pseudoreplication.

Rep But does that mean it was pointless measuring 100 students per hall? Wouldn't 50 or 10 or even one have done just as well?

GMcP Oh no, not at all. The more students you measured, the greater the precision with which you measured things within each hall but this increase in precision was already reflected empirically in the means you calculated. I used these means in my previous analysis and in using them to estimate the variance I automatically, and without having to model it explicitly, took account not only of the between-hall variation but of the contribution that within-hall variation made to overall uncertainty. Going to the original data, however, I had to take care to model the various effects appropriately.

Rep Well this is all rather disappointing. Do you have any pearls of wisdom to impart?

GMcP Yes two. The first is this. If you know something important about your data but the software code you are using doesn't reflect this, something is almost certainly wrong. Experiments in which diets vary between halls are obviously very different to those in which they vary within. This requires that the data are analysed differently.

Rep And the second?

GMcP That's about correlation...

Rep Yes, I know. "Correlation is not causation." You statisticians are always banging on about that.

GMcP Yes. We are always being told that that is all we statisticians have to say about causation but the issue is more subtle than that. Correlation can be relevant to judging causation and here it had two effects that could easily be overlooked. First, the correlation between halls does not have to be the same as that within but the former is relevant to judging the effect of treatment whereas the latter is what a naive analysis may pick up. Second, random variation between halls induces a correlation: students in the same hall cannot be treated as being independent and this effects the calculation of the variance.

Joining the dots

Clearly this story is a farrago of utter nonsense so what is the point? It has a connection to Lord's paradox and I invite the reader to join up the dots for themselves. A previous blog of mine treats this. Some references (1-6) are given below. (Note that with the exception of the excellent analysis by Holland and Rubin, I do not agree 100% with these analyses except that by Senn.)

However, the lessons are far from being theoretical. For a genuine and famous experiment where similar issues arise, see Student's discussion of the Lanarkshire Milk Experiment (7).

Pseudoreplication (8) is relevant here and also to the analysis of Lord's paradox, although this has not always been appreciated.

* All analyses were performed with Genstat.

Appendix: code for Genstat analyses

Comments are in quotes "". The rest is code with procedure names underlined. All four analyses are equivalent.

"ANCOVA of summaries"

BLOCKSTRUCTURE Hall_S "Hall, 20 values"

TREATMENTSTRUCTURE Between_S "Diet, 20 values"

COVARIATE X_mean "Mean initial weight per hall, 20 values"

ANOVA[FPROBABILITY=Yes;PRINT=aovt, info, cova,effects] Yb_mean "Mean final weight per hall, 20 values"


"Analysis of original values"

BLOCKSTRUCTURE Hall/Student "100 students in each of 20 halls"

TREATMENTSTRUCTURE Between "diet given to each student, 2000 values"

COVARIATE X "initial weight of each student, 2000 values"

ANOVA[FPROBABILITY=Yes;PRINT=aovt, info, cova,effects] Yb "final weight of each student, 2000 values"

"Regression model using summaries"

MODEL Yb_mean

TERMS X_mean+Between_S

FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes] X_mean+Between_S

"Equivalent mixed model"

"XMpH is the mean initial weight per hall but ascribed to each student and has 2000 values. XDiff is the difference between the student's initial weigh"

VCOMPONENTS [FIXED=XMpH,XDiff,Between; FACTORIAL=1] \ RANDOM=Hall; INITIAL=1; CONSTRAINTS=none

REML [PRINT=model,components,waldTests,effects;\ FMETHOD=automatic; \MVINCLUDE=*; METHOD=AI;\ ?MAXCYCLE=30] Yb; SAVE=_remlsave

I am grateful to members of the Genstat discussion list for help with formulating the mixed model. A relevant paper is by Mike Kenward and James Roger (9)

References

Lord's paradox

1. Holland PW, Rubin DB. On Lord's Paradox. In: Wainer H, Messick S, eds. Principals of Modern Psychological Measurement. Lawrence Erlbaum Associates; 1983:3-25.

2. Lord FM. A paradox in the interpretation of group comparisons. Psychological Bulletin. 1967;66:304-305.?

3. Pearl J, Mackenzie D. The Book of Why. Basic Books; 2018.

4. Senn SJ. Change from baseline and analysis of covariance revisited. Statistics in Medicine. 30 December 2006 2006;25(24):4334–4344.?

5. Van Breukelen GJ. ANCOVA versus change from baseline had more power in randomized studies and more bias in nonrandomized studies. Journal of clinical epidemiology. Sep 2006;59(9):920-5.?

6. Wainer H, Brown LM. Two statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data. American Statistician. May 2004;58(2):117-123.?

Lanarkshire milk experiment

7. Student. The Lanarkshire milk experiment. Biometrika. 1931:398-406.?

Pseudoreplication

8. Hurlbert SH. Pseudoreplication and the design of ecological field experiments. Ecological monographs. 1984;54(2):187-211.?

Mixed models

9. Kenward MG, Roger JH. The use of baseline covariates in crossover studies. Biostatistics. Jan 2010;11(1):1-17. doi:10.1093/biostatistics/kxp046

Stephen Senn

Statistical Consultant

2 年

I have now added a link to the data in case anybody else wishes to analyse them. analyses:

回复

要查看或添加评论,请登录

Stephen Senn的更多文章

  • May the fourth be with you

    May the fourth be with you

    Be merciless in your pedantry: give no quartile The photograph is of the Laxey Wheel on the Isle of Man . If you look…

    11 条评论
  • Twin Piques

    Twin Piques

    ..

    5 条评论
  • Having a Sense of Proportion

    Having a Sense of Proportion

    The arguments are asymptotic but are relevant to situations where the sampling fluctuations are large enough to be of…

    9 条评论
  • A Pronounced Mistake

    A Pronounced Mistake

    Narrow fabric I come from a family of ribbon makers whose business was based in Basle. In fact, ribbons were in the…

    3 条评论
  • Match fit

    Match fit

    Matching and fitting in observational studies and the relevance or otherwise of the comparison with randomised studies…

    16 条评论
  • Tensions over Testing

    Tensions over Testing

    Bear with me The navigational solution to getting off Ben Nevis is a technique called a ‘dog-leg’. This is a technique…

  • Beware of Interactions

    Beware of Interactions

    Parallel trials but not lines In a previous post I used an example from Chuang-Stein and Tong(1996) to illustrate…

  • The Main Chance

    The Main Chance

    Almost nobody on LinkedIn will remember The Main Chance, a British television series that ran from 1969-1975 featuring…

    18 条评论
  • Bias Binding?

    Bias Binding?

    By randomizing the order in which the administrative regions change the treatment regimen, SWITCH SWEDEHEART overcomes…

  • Being Just about Adjustment in Clinical Trials

    Being Just about Adjustment in Clinical Trials

    Estimation of the magnitude of effects and of the relevant precision in general needs inclusion of strata parameters…

社区洞察

其他会员也浏览了