A Standard Error

A Standard Error

Sum Mistake

No alt text provided for this image


The standard error of a mean is famously known to be the standard deviation divided by the square root of the sample size, a formula that is valid, except, of course, when it isn't, which is nearly always. When the student of statistics first encounters the formula, it will be justified either as a special case of a theorem for the variance of the linear combination of a series of random variables with arbitrary variances and covariances, the special case that applies being that in which the variances are all the same, the covariances are all zero and the weights of the linear combination are all identically equal to 1/n, with n being the sample size.

A more direct argument might proceed like this:

  1. Consider the sum, S, of n identical, independent random variates each with variance V.
  2. The variance of the sum will be nV, since, the variates being independent, their covariances must be zero.
  3. The mean is simply S/n.
  4. Its variance must be nV divided by the square of n and so equal to V/n.
  5. The standard error is the square root of this.

If the student then asks when one might know that the necessary conditions to justify this result apply, they may be provided with one of two answers. 1) Mathematicians may assume anything they like and explore the consequence. 2) Imagine an infinite population of members of a distribution which may be assumed to have a variance. Now take a simple random sample of n values from it.

Of course, if the student is not actually studying statistics per se but statistics as a necessary black art as part of a research career, the formula for the standard error of the mean can be pulled like a rabbit out of a hat with the advice that it will not in any case be necessary to remember it, since the graphing software will include it as part of the plots used to adorn research papers.

The Simple Sample Assumption

No alt text provided for this image


An example of the sort of abomination that is produced is given in the header to this piece (and reproduced above since the header seems to have a mind of its own when it comes to display). It shows the results in a trial in asthma 6 hours (480 minutes) after treatment for two beta-agonists, formoterol and salbutamol. The means are represented by solid bars for no good reason I can think of it but plausibly because of the bad one that if bars are good enough for frequencies why not for means? The standard errors are represented by 'whiskers'.

This sort of plot is sometimes referred to as a dynamite detonator diagram, a phrase I first heard Brian Bond use sometime in the previous millenium. I shall refer to it as a notplot to underline the fact that despite its being given as a standard option by most major packages, you should never use it. An exception to the packages is Genstat, which happens to be the package I use, so producing the plots here has involved me in rather a lot of tedious programming. I hope that you all apreciate the dedication I show in writing these blogs!

No alt text provided for this image


The diagram immediately above is a variation on this theme, whereby 95% confidence intervals have been added above and below the bars representing the means as opposed to a simple standard error above. This plot also has no good purpose.

The standard errors, and hence by extension the confidence intervals as probability limits for an estimate of the population mean, would only be valid if simple random sampling had taken place. It hasn't and it couldn't. The figures are from a clinical trial and random sampling is not involved in clinical trials and the fact that it is not and never could be is one of the reasons why we set such store by concurrent control and also one of the reasons why such plots should not be produced. Anybody who doubts this, should have a serious look at sampling theory and discuss their delusions with statisticians who work in survey methodology.

Ironically, although the formula for the standard error of the means is not applicable in clinical trials, the formula for the standard error of the difference between means is, provided that simple randomisation has been used or, if not, provided that the design is reflected in its calculation.

There are two possible arguments[1]. The first, is based on additivity of the treatment effects. The fact that the patients are not representative does not matter provided that they are assigned to treatment and control. Any 'bias' in slecting patients will disappear by subtraction. Of course, this is a strong assumption. The second, fixes the patients who are, of course, a perfectly unbiased and exhaustive sample of themslves. It then tries to address the question what effect did treatment have on these patients? This is not a simple question since only half (for a one to one randomisation in a two-armed trial) received the experimental treatment and half the control. It turns out that the standard error of the difference between the means from two random samples is a good approximation to the relevant randomisation distribution standard error.

Vive la Difference!

The excuse is sometimes made that by quoting the individual 95% confidence intervals we can judge, by seeing if they overlap, whether the difference between groups is 'significant'. There are three objections to this argument. The first is that technical one that a significant difference at the 5% level is obtained (approximately) if the 84% limits do not overlap. (The exact level depends on degrees of freedom and the ratio of variances.)

The second is that there may in any case be elements of the variance within groups, used to calculate the standard error of the mean, that are eliminated by the experimental design once differences are considered. This is in fact the case with the example I have plotted. It is from a cross-over trial treated as Example 3.1 in my book Cross-over Trials in Clinical Research[2]. The variance of the difference can't be calculated from the group standard errors since these do not eliminate the patient effect.

The third is that if you are interested in differences, it is differences you should present. More than thirty years ago, Petra Auclair and I published a paper to this effect in Statistics in Medicine. The figure below is the sort of thing we proposed.

No alt text provided for this image

This uses about as much ink as the notplot but presents the results over time for the same trial. The time point of 480 minutes is highlighted since this was the time point chosen for the previous plot. Of course this plot over time can be criticised in that the standard errors, and hence the confidence limits, have been calculated independently at each time point. Some may prefer a pooled variance. However, such a criticism would apply a fortiori to the notplot which would have neither a valid nor a relevant standard error, whether pooled or not..

According to Google Scholar my paper with Petra has been cited just 29 times, an average of less than one citation per year. :-(

Advice

When asked to produce a notplot, don't.

Acknowledgement

My thanks to Stefano Vezzoli for having spotted a very confusing typo, which I have now corrected.

References

1. Senn SJ. Added Values: Controversies concerning randomization and additivity in clinical trials. Research paper. Statistics in Medicine. Dec 6 2004;23(24):3729-3753.?

2. Senn SJ. Cross-over Trials in Clinical Research. Second ed. Wiley; 2002.

3. Senn SJ, Auclair P. The graphical representation of clinical trials with particular reference to measurements over time [published erratum appears in Statistics in Medicine 1991 Mar;10(3):487]. Statistics in Medicine. 1990;9(11):1287-302.?




Andreea S.

Lecturer - Public Health & Community Health. Affiliated tutor & assessor DL Programmes MSc Epidemiology

9 个月

A great reminder, thank you. I really like the term 'notplot'. I will suggest to have the word used in #statistics with MCQ&A in some tests

回复

要查看或添加评论,请登录

Stephen Senn的更多文章

  • May the fourth be with you

    May the fourth be with you

    Be merciless in your pedantry: give no quartile The photograph is of the Laxey Wheel on the Isle of Man . If you look…

    11 条评论
  • Twin Piques

    Twin Piques

    ..

    5 条评论
  • Having a Sense of Proportion

    Having a Sense of Proportion

    The arguments are asymptotic but are relevant to situations where the sampling fluctuations are large enough to be of…

    9 条评论
  • A Pronounced Mistake

    A Pronounced Mistake

    Narrow fabric I come from a family of ribbon makers whose business was based in Basle. In fact, ribbons were in the…

    3 条评论
  • Match fit

    Match fit

    Matching and fitting in observational studies and the relevance or otherwise of the comparison with randomised studies…

    16 条评论
  • Tensions over Testing

    Tensions over Testing

    Bear with me The navigational solution to getting off Ben Nevis is a technique called a ‘dog-leg’. This is a technique…

  • Beware of Interactions

    Beware of Interactions

    Parallel trials but not lines In a previous post I used an example from Chuang-Stein and Tong(1996) to illustrate…

  • The Main Chance

    The Main Chance

    Almost nobody on LinkedIn will remember The Main Chance, a British television series that ran from 1969-1975 featuring…

    18 条评论
  • Bias Binding?

    Bias Binding?

    By randomizing the order in which the administrative regions change the treatment regimen, SWITCH SWEDEHEART overcomes…

  • Being Just about Adjustment in Clinical Trials

    Being Just about Adjustment in Clinical Trials

    Estimation of the magnitude of effects and of the relevant precision in general needs inclusion of strata parameters…

社区洞察

其他会员也浏览了