ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Cause for concern

Stephen Senn

Statistical Consultant

å‘å¸ƒæ—¥æœŸ: 2021å¹´10æœˆ3æ—¥

A diet of worms

I have previously written on Lord's paradox and how the Rothamsted approach to analysing designed experiments can help understand it. See, for example, Rothamsted Statistics meets Lordâ€™s Paradox. I am going to revisit this with the purpose of explaining why I think that much discussion of the paradox is wrong but also with the object of providing a simulated example, so that those who disagree can illustrate their proposed solution numerically: calculation is conducive to clarity. Before I do so, I will deal with one red herring. Lord's paradox concerns observational data and not an experiment. Therefore, it has been argued, in discussing it, considering experiments is not relevant. However, whatever difficulties apply to an experimental study will apply a fortiori to an observational study. Thus, if something won't work for an experimental study, it won't work for an analogous observational study. The converse does not necessairly hold. See Red Herrings and the Art of Cause Fishing. Thus, if the object is to point a problem rather than a solution it is perfectly acceptable to consider an experimental analogue.

Lord's Paradox

Lord's paradox involves two statisticians considering the effect of diet on weight gain of students at a college. Lord's original paradox has two rather confusing features. First a control diet is not mentioned and secondly the effect of diet as modified by sex of the students is discussed. The missing control makes it difficult to discuss causality, whereas the addition of sex as an effect-modifier makes it more complicated than is necessary for the paradox to appear. Therefore, I am going to follow the approach in The Book of Why (1) and consider the 'cleaner' version of the paradox as modified by Wainer and Brown(2).

The Wainer and Brown set up is described in The Book of Why as follows

...the quantity of interest is the effect of diet (not gender) on weight gain...In their version students eat in one of two dining halls with different diets. (P 216)

Let us label the two diets A and B. I am going to make one further modification. The outcome variable is defined as weight gain. However, 1) Weight gain is a derived variable (it is calculated from measured initial and final weights as the latter minus the former) 2) Changing diets cannot effect initial weight 3) Thus, the causal element in weight gain is provided by final weight and therefore 4) The important thing is to study the effect of diet on final weight. This is what I propose to do.

Two tribes go to war

Two statisticians now propose to analyse the data. One, whom I shall call John, proposes to caculate the change score (final weight - initial weight), that is to say the weight gain and then compare the scores using the two-sample t-test. The other whom I shall call Jane, proposes to carry out an analysis of covariance in which the initial weight is fitted as a covariate in a model that also includes diet. To avoid another red-herring being raised, I should point out that if initial weight is fitted as a covariate it makes no difference if the final weight or the change score is used as outcome variable. A very nice paper by Nan Laird makes this clear (3).

One way of describing the difference between the two approaches is that each statistician proposes to correct the final weight by subtracting a multiple of the initial weight and analyse the resulting score. John proposes to set the multiple concerned to 1 and Jane proposes to let the data tell her what value to use. She will use the within-diet-group regression of final on initial weight to determine the adjustment. The resulting point estimates can equivalently be represented by the following formula

Estimate = (final mean difference between groups) - beta x (initial mean difference between groups)

where beta = 1 for John and beta is the regression coefficient for Jane.

As is obvious from the formula, unless at least one of the following two conditions applies,

The initial difference in weight between diet groups is zero
The estimated regression coefficient beta is one

Jane and John will not get the same point estimate.

In the original form of the paradox, there is an important difference between the two diet groups initially but neither group changes on average over time. It thus follows that the mean difference between the two groups finally is the same as the mean difference initially. Thus, John estimates the effect to be zero. However, as is commonly the case when the same measure (for example weight) is involved as predictor and outcome, the regression of final on initial values is less than one and so Jane concludes there is a difference.

However, these particular conditions, although they lead to the striking result that the apparently more sophisticated approach of analysis of covariance estimates an effect of diet when 'common sense' suggests there isn't one (mean weights have not changed over time for either diet) are not necessary for a problem to occur. In general Jane's analysis will not yield the same result as John's, so who is right?

In my opinion, by far the best general discussion of this was by Holland and Rubin(4) nearly 40 years ago. They consider various different assumptions and scenarios and show how these lead to different conclusions.

What I propose to do is apply John Nelder's experimental calculus and see to what conclusions this leads. To cut a long story short that conclusion will be that neither approach is right unless untestable assumptions apply. In order to do this I shall simulate an example but before this is done it is necessary to discuss variation in experiments.

Blocks and blockheads

A key understanding that was achieved by Fisher and others who subsequently worked at the agricultural research station at Rothamsted was that the basic experimental material they worked with, usually plots in fields, could vary in a complex way, even when given the same treatment. In fact this could be empirically demonstrated and was so from time to time, using so-called uniformity trials. (An interesting review of such trials is provided by McCullagh and Clifford (5).)

For field experiments, where simple randomisation across a field was used, this did not matter. The fact that correlation between neighbouring plots tended to be higher than those further away did not bias the variance estimates, provided that the spatial scattering of plots given the same treatment was similar to those given different treatments. Variation in the former was used to judge variation in the latter and if there was no treatment effect, the variances could be expected to be the same.

More complex experiments, however, required care. Certain types of treatment in a field (for example fertilisers and varieties of crops) could be varied at a different level to others. How should the variances be calculated? Sometimes the field could be organised into 'blocks' within which, in the absence of treatments, yields could be assumed to be more similar. From block to block the variation in yield would be greater. How you estimated the standard error then depended on how you varied the treatments. Did you vary them between or within blocks?

To give a dietary example suppose that at the same time that diets A and B are being compared using two student halls, one for each diet, the students within the halls are also given at random two possible dietary plans, C and D to follow. Suppose to make it concrete, there are 20 students per hall and within each hall 10 are given plan C and 10 are given plan D. The situation now is that there are 20 students receiving diet A and 20 receiving diet B but also 20 on plan C and 20 on plan D. However, the diets are varied between halls and the plans within. Can the standard errors be estimated in the same way?

The Rothamsted approach says not. I shall simulate some data and explein why not

Simulate to stimulate

The basic block structure we have to deal with is that we have students within halls. This is written Hall/Student in Nelder's experimental calculus(6,7) and in the Genstat code based on Nelder's work and further work by Roger Payne and others(8,9) I can write:

BLOCKSTRUCTURE Hall/Student?

I am going to simulate initial and final weights that respect this block structure. However, initial and final weights are correlated so a correlation needs to be used in the simulation. Actually, two correlations need to be used: between and within halls. Both are relevant and they do not have to be identical.

To see how one can have different correlations at different levels, consider an observational study in which a large number of individuals keep diet diaries. They record quantities of staple foods consumed: bread, potatoes, pasta, rice. If we look, say, at the correlation day by day of bread and potatoes for given individuals, we find that these are negative: they get their calories one way or another. However if we average the consumption over time of bread and potatoes, we might find that the correlation over persons is quite different. It might for example be positive here. Individuals differ according to how many calories they consume and high consumers consume more bread but also more potatoes than those who eat less.

When I gave this example on Twitter, someone remarked that I was talking about partial correlations. I find this label unhelpful at the best. A partial correlation, for example, would be the correlation between rice and pasta controlling for bread. Here I am talking about the same variables at different levels in a hierarchy, not different variables at the same level.

These are the parameters I used for my simulation. I assumed that the distributions were stable over time so that expectations and variance for initial and final weights were identical. (I wrote the program to let me have an effect, delta, of diet B, compared to A, but I shall present the null case where this is zero.)

Table 1. Parameter values used for the simulation.

I then simulated in two stages. First, simulating expected values for the two halls and then, simulating differences for the students to the hall mean. For the latter I used distributions with mean zero. These values were then added to the hall mean. Randomisation was applied.

A diet of data

The data are given in the appendix. However it is perhaps more useful to visualise them. Figure 1 plots initial against final weight by diet.

Figure 1. Plot of data simulated to illustrate Lord's paradox. The means for the initial and final weights for the two halls are indicated by a labelled asterisk.

The plot seems to suggest that there is an effect of diet on weight. It seems to be the case that for individuals with similar initial weights (which cannot have been affected by the diets subsequently given) the final weights are higher if they had diet B. However, there is one problem with this particular conclusion, diet is confounded with hall. Thus what seems to be an effect of diet B compared to diet A could equally well be a difference between hall 1 and hall 2.

Now consider some result from the alternative experiment that was suggested, varying advice given to students in the same hall and giving them either plan C or plan D. A possible result is illustrated in Figure 2.

Figure 2. Plot of data simulated to illustrate a possible comparison of dietary advice varied within halls. Again, the means for the initial and final weights for the two halls are indicated by a labelled asterisk.

Now it seems that there is little evidence of an effect of dietary advice on final weight. That may seem disappointing but since the two plans were varied within halls, differences between halls cannot be an explanation.

Why be simple if you can complicate things?

Both of these experiments would be pretty elementary by the standards of experiments being run in agriculture nearly a century ago. As I have already explained agronomists were used to varying treatments at different levels. What happens if we vary diets between halls and advice within? The situation is illustrated in Figure 3. What does John Nelder's calculus say about analysing this complex experiment?

Figure 3. Plot of simulated data arising from running a complex experiment consisting of varying diest between halls and advice within. The meaning of symbols is as in Figures 1 and 2.

Since we have decided to fit initial weight as a covariate, I first need to add a covariate statement to my previous blockstructure statement:

COVARIATE Initial

I then need to add a treatment structure statement as follows

TREATMENTSTRUCTURE Between+Within

(Here Between stands for the diet variaed between halls and Within for the advice varied within.)

é¢†è‹±æŽ¨è

A National Institute of Nutrition: of Science, Sense, and Sisyphus

A National Institute of Nutrition: of Science, Senseâ€¦

David L. Katz, MD, MPH 6 å¹´å‰

Are You Carrying Dangerous Visceral Fat?

Ian Worthington 5 ä¸ªæœˆå‰

Recent Research on Fasting and Calorie Restriction (June 2024, part 2)

Recent Research on Fasting and Calorie Restrictionâ€¦

Robin Mesnage 8 ä¸ªæœˆå‰

Finally, I request the analysis of Final weight

ANOVA[PROBABILITY=Yes]Final

The results are given in Table 2

Table 2. Analysis of the simulated example using Genstat.

Genstat warns me right away that the experiment is degenerate as regard the analysis of diet varied between halls. It is impossible to tell which of Halls or Diet is the explanations and it refuses to give me a P-value. How does the algorithm do this? It simply looks to see at what level of the block structure the treatment is varied. The answer is halls. There are only two halls and that number is inadequate to permit the relevant random variation to be analysed. In other words, the Rothamsted approach tells us that neither John nor Jane is right. Jane's analysis seems to provide a solution but this is only if the assumption is made that there is no variation or covariation between halls above and beyond what is provided by students within. Note that it is no defence of the causal calculus here to say that it is only interested in the solution as the sample size grows to infinity. It is essential to understand and identify what has to go to infinity. Here it is not students but halls that have to grow in number and furthermore the relevant regression coffecient is not final weight on initial weight within halls but the corresponding regression between halls. However, the only way that Jane could estimate the regression is within thus without the special assumption not only her standard error but her estimate will be wrong.

Dietary advice on the other hand provides no problem. This is varied with the halls. There are ample degrees of freedom to estimate the variance and there is no problem in estimating the regression of final on initial values..

Thus, variation, design and replication are key to what is going on. In a subsequent post I shall consider what analysis would look like if we have enough degrees of freedom to estimate variation between halls.. For the moment I consider some lessons.

Lessons

I draw the following lessons.

The first lesson is that the Rothamsted approach, which reached its apotheosis in John Nelder's papers of 1965 provides a powerful way of understanding complex experiments.

The second is that since observational studies may share the complexity of variation at different levels that experiments have, the Rothamsted approach may have relevance to interpreting observational studies.

The third lesson is that impressive as the progress has been of our understanding of causality, thanks to the work of Judea Pearl(1,10,11) and others, it will remain incomplete, unless it can be developed to handle random effects also. (It may be that this is partially addressed by the work of Greenland and Mansournia (10) and I have it on my to do list to study this.)

Thanks

My thanks to Ewout Steyerberg for eagle-eyed proof reading of V1.

References

1. Pearl J, Mackenzie D. The Book of Why. Basic Books; 2018.

2. Wainer H, Brown LM. Two statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data. American Statistician. May 2004;58(2):117-123.?

3. Laird NM. Further comparative analyses of pre-test post-test research designs. The American Statistician. 1983;37:329-330.?

4. Holland PW, Rubin DB. On Lord's Paradox. In: Wainer H, Messick S, eds. Principals of Modern Psychological Measurement. Lawrence Erlbaum Associates; 1983:3-25.

5. McCullagh P, Clifford D. Evidence for conformal invariance of crop yields. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2006;462(2071):2119-2143.?

6. Nelder JA. The analysis of randomised experiments with orthogonal block structure I. Block structure and the null analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:147-162.?

7. Nelder JA. The analysis of randomised experiments with orthogonal block structure II. Treatment structure and the general analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:163-178.?

8. Payne R, Tobias R. General balance, combination of information and the analysis of covariance. Scandinavian Journal of Statistics. 1992:3-23.?

9. Payne R, Wilkinson G. A general algorithm for analysis of variance. Applied Statistics. 1977:251-260.?

10. Pearl J. Causality : Models, Reasoning and Inference. Cambridge University Press; 2000.

11 Pearl J, Glymour M, Jewell NP. Causal Inference in Statistics A Primer. Wiley; 2016.

12. Greenland S, Mansournia MA. Limitations of individual causal models, causal graphs, and ignorability assumptions, as illustrated by random confounding and design unfaithfulness. European Journal of Epidemiology. 2015;30(10):1101-1110.?

Appendix

The data

Simulated data for the Lords paradox discussion. Columns with a heading followed by an explanation mark are factors. Weights are in kg.

The data can be downloaded from here.

Simulated data

https://www.senns.uk/Lords_Paradox_Simulated.xls

Jonathan Bartlett

Professor in Medical Statistics at London School of Hygiene & Tropical Medicine

3 å¹´

Thanks Stephen. Below Figure 1, you say "diet is confounded with hall". But if diet is randomly assigned at the hall level, I would not say diet is confounded with hall. If there is between hall variation, the difference in initial weights between halls would be called 'random confounding' by some (not me), but if one considers repeatedly running this trial, with these two particular halls, on average the mean initial weight of the hall assigned to A is the same as for B, because of randomisation. Or if one instead imagines that the two halls have been sampled from a population of halls, and diet assigned at random, again I would say there is no confounding. You say that Pearl's and other's approach cannot handle random (e.g. hall) effects. I am certainly not the person to give a definitive response to this. But I don't see why this isn't possible within a DAG: you have a node for hall effects, separate nodes for initial weights for each individual (in a particular hall). The hall node feeds into each individual's initial weight node, and each individual's final weight node. Each individual's final weight depends on their initial weight. You then have a single diet node, that feeds into each individual's final weight node.

èµž

å›žå¤

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Stephen Sennçš„æ›´å¤šæ–‡ç«

May the fourth be with you

2025å¹´3æœˆ14æ—¥

May the fourth be with you

Be merciless in your pedantry: give no quartile The photograph is of the Laxey Wheel on the Isle of Man . If you lookâ€¦

11 æ¡è¯„è®º
Twin Piques

2025å¹´2æœˆ19æ—¥

Twin Piques

..

5 æ¡è¯„è®º
Having a Sense of Proportion

2025å¹´2æœˆ6æ—¥

Having a Sense of Proportion

The arguments are asymptotic but are relevant to situations where the sampling fluctuations are large enough to be ofâ€¦

9 æ¡è¯„è®º
A Pronounced Mistake

2024å¹´12æœˆ20æ—¥

A Pronounced Mistake

Narrow fabric I come from a family of ribbon makers whose business was based in Basle. In fact, ribbons were in theâ€¦

3 æ¡è¯„è®º
Match fit

2024å¹´12æœˆ10æ—¥

Match fit

Matching and fitting in observational studies and the relevance or otherwise of the comparison with randomised studiesâ€¦

16 æ¡è¯„è®º
Tensions over Testing

2024å¹´8æœˆ25æ—¥

Tensions over Testing

Bear with me The navigational solution to getting off Ben Nevis is a technique called a â€˜dog-legâ€™. This is a techniqueâ€¦
Beware of Interactions

2024å¹´8æœˆ16æ—¥

Beware of Interactions

Parallel trials but not lines In a previous post I used an example from Chuang-Stein and Tong(1996) to illustrateâ€¦
The Main Chance

2024å¹´8æœˆ12æ—¥

The Main Chance

Almost nobody on LinkedIn will remember The Main Chance, a British television series that ran from 1969-1975 featuringâ€¦

18 æ¡è¯„è®º
Bias Binding?

2023å¹´8æœˆ22æ—¥

Bias Binding?

By randomizing the order in which the administrative regions change the treatment regimen, SWITCH SWEDEHEART overcomesâ€¦
Being Just about Adjustment in Clinical Trials

2023å¹´7æœˆ14æ—¥

Being Just about Adjustment in Clinical Trials

Estimation of the magnitude of effects and of the relevant precision in general needs inclusion of strata parametersâ€¦

See all articles

Cause for concern

Stephen Senn

Statistical Consultant

A diet of worms

Lord's Paradox

Two tribes go to war

Blocks and blockheads

Simulate to stimulate

A diet of data

Why be simple if you can complicate things?

é¢†è‹±æŽ¨è

Lessons

Thanks

References

Appendix

The data

Stephen Sennçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Genetics vs. Lifestyle: Why Your Habits Matter More Than Your Genes

How a Healthcare System Collapses: The Trap of Free Buffets and the Epigenetics of Longevity

50 Emerging Technology Themes to watch out for in 2019

We are not obese, we are inflamed!

Book Summary: Nature Wants Us to Be Fat

The Human Organism (â€˜Manâ€™) â€¦.

What are the benefits of Using Natural Herbs to Help with a Catâ€™s Wellbeing like Turmeric as well as Humans?

Nutrition & Psychology-Issue 2: Is Memory Loss Inevitable? A Biological, Social and Cognitive Perspective

Can this pill help slow the ageing process?

The Power of Genetics: How a Nutrigenetic Health Panel Can Transform Your Well-Being

A diet of worms

Lord's Paradox

Two tribes go to war

Blocks and blockheads

Simulate to stimulate

A diet of data

Why be simple if you can complicate things?

é¢†è‹±æŽ¨è

Lessons

Thanks

References

Appendix

The data

Stephen Sennçš„æ›´å¤šæ–‡ç«

May the fourth be with you

Twin Piques

Having a Sense of Proportion

A Pronounced Mistake

Match fit

Tensions over Testing

Beware of Interactions

The Main Chance

Bias Binding?

Being Just about Adjustment in Clinical Trials

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Genetics vs. Lifestyle: Why Your Habits Matter More Than Your Genes

How a Healthcare System Collapses: The Trap of Free Buffets and the Epigenetics of Longevity

50 Emerging Technology Themes to watch out for in 2019

We are not obese, we are inflamed!

Book Summary: Nature Wants Us to Be Fat

The Human Organism (â€˜Manâ€™) â€¦.

What are the benefits of Using Natural Herbs to Help with a Catâ€™s Wellbeing like Turmeric as well as Humans?

Nutrition & Psychology-Issue 2: Is Memory Loss Inevitable? A Biological, Social and Cognitive Perspective

Can this pill help slow the ageing process?

The Power of Genetics: How a Nutrigenetic Health Panel Can Transform Your Well-Being

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†