Shaped Pairs
Empirical and Theoretical Distribution of the F-Ratio for an Example with2 Subjects in two Groups

Shaped Pairs

Often the number of random arrangements is far too great for them to be examined exhaustively...' RA Fisher, The Design of Experiments (1), p57

The Story so Far

In a previous post I explained how one could look at sums of squares within and between groups in terms of sums of squares of all paired differences. In this post I shall take a particular example to illustrate how Fisher's device of randomising units to treatment enables the parametric theory for variance ratios to be exploited.

Some Data

I shall take as an example, data that are given in Stuart Pocock's classic text on clinical trials (2), which celebrates the 40th anniversary of its publication this year. The data are from Table 13.7 on p203 and concern a trial carried out by Karpatkin et al in 1981 (3) to examine the effect on their babies of giving steroids to pregnant women suffering from thrombocytopenia. I shall only use the data from the 12 mothers in the intervention group. The data are as follows

No alt text provided for this image
Data from Karpatkin et al 1981

.

?Pocock provides the untransformed data but draws attention to their being skewed and consequently illustrates the Wilcoxon test. I have log-transformed them to make them better behaved and will use the transformed data in what follows. I shall discuss the issue of transformations at the end.

I am going to take these twelve results to illustrate what a null distribution might look like. Null because I shall use a group of subjects given the same treatment. I shall do this by dividing them into two groups of six subjects each. In fact, I shall consider every possible division of the two groups in this way. There are 12!/(6!6!)=924 such possible divisions. I have used the SETALLOCATIONS procedure in the software package Genstat(R)to find each of these 924 allocations and having done so I have calculated 1) The mean square between the two groups so created 2) The corresponding mean square within the two groups and 3) The ratio of the the first to the second. This is the variance ratio or F statistic.

No alt text provided for this image
Figure 1. Results for all 924 divisions of the 12 steroid results into two groups of six.

Figure 1 plots the mean square within (MSW) against the mean square between (MSB). The points will be seen to lie on a straight line but this simply reflects the point that we made in the previous post that a cornerstone of Fisher's analysis of variance is that the sums of squares within (SSW) plus the sum of squares between (SSB) is equal to the total sum of squares (SST). Since SST does not change from one of the 924 divisions to the other and since SST = SSW+SSB it follows that as soon as SSB is determined so is SSW and that the points must lie on a straight line.

Note that each of the 924 division defines a point and that each point defines a variance ratio, the means by which the 'significance' of a result is judged. The figure also shows the critical boundary that any variance ratio must cross (lie below and to the right of) to be judged significant. Of the 924 points, 44, so 100 x 44/924 = 4.8% are judged significant, this is remarkably close to the nominal 5% that the F-distribution suggests.

Distributions and Decisions

The F-distribution that we use nowadays is due to Snedecor and is a transformation (and so equivalent to) the z-distribution, that Fisher (4) proposed in Statistical Methods for Research Workers and had derived a little earlier(5). The relationship is z=log(F)/2 and F=exp(2z). Both are based on the assumption that the data can be treated as if they were sampled from a Normal distribution. Here the assumption works quite well. Figure 2, used as the header for this blog, gives the distribution estimated by applying a 'smoother' to the 924 variance ratios (the blue line) as well as the theoretical distribution (the dashed red line) given by the F-distribution.

No alt text provided for this image
Figure 2. Empirical and Theoretical Variance Ratio Values (F-statistics)


An alternative way of judging the fit is to produce a QQ plot. I have done this using the DPROBABILITY procedure in Genstat(R) and the result is shown in figure 3. The 924 variance ratios are plotted and are expected to lie along the dashed blue line. The red curves define a probability envelope in which the points should lie if the F-distribution applies. The fit is remarkably good.

No alt text provided for this image
Figure 3. QQ plot for the variance ratios.

Discursive Discussion

In the previous blog I explained how the sums of squares used in analysis of variance could be constructed from differences between all possible pairs noting which pairs occurred within groups and which between. It seems reasonable that a necessary condition for a system of inference to work well based on sums of squares should be that each possible pair has the same chance of being a within pair and that each possible pair has the same chance of being a between pair. Randomisation is a way of ensuring this.

This raises a number of issues, however. The first is that whereas there are 924 possible divisions of the sample of 12 into two groups of six, there are only 12 x (12-1) = 132 possible pairs. The former is seven times the latter. Furthermore if we do not care which member of a pair is 'first' and which is 'second', we have only 66 pairs and so a ratio of 14 to 1. Could it be that a reduced randomisation could produce the desired result? I am not even an amateur when it comes to combinatorics and I don't know. I am sure that there are others who do. (I suspect that there is some simple argument I am overlooking.) I think in practice it doesn't matter. Randomisation is simple and effective.

The second issue is that the parametric edifice that Fisher erected is not quite the same as the randomisation one. The former imagines random sampling from a hypothetical infinite Normal population (not to be confused with some real population) and the latter randomisation on a fixed sample. The two frameworks can given remarkably similar results (as is the case here). However, this is not always the case and even here, using the original values rather than the log-transformed ones the agreement is not so good.

The third issue is that for a marginal distribution to be relevant we must not be faced with a recognisable subset. I have argued elsewhere (6) that randomisation allows one to use the distribution in probability that the effect of unseen influences may have. However if these influences become seen, it is their actual distribution that matters not what it might have been over all randomisations. The standard way to deal with this is by analysis of covariance (7).

The fourth issue is this, however. The fact that you cannot use randomisation as an excuse for ignoring prognostic covariates, is not an argument for not randomising. Block what you can, randomise what you can't but condition on what you observe and consider is relevant. However, marginal distributions are calibrating for conditional ones. If you get the former wrong, the error may propagate to the latter. Making sure that your marginal inferences are valid is not the be all and end all but it is a useful first step.

Warning

In looking at this blog and the previous one I realise that there is a problem regarding the decomposition of the sums of squares in the form of pairs in the way I described as regards its correspondence to ANOVA. I am working on this and will correct it (I hope!) in due course.

References

1.FISHER, R. A. 1990. The Design of Experiments. In: BENNETT, J. H. (ed.) Statistical Methods, Experimental Design and Scientific Inference. Eighth ed. Oxford: Oxford.

2. POCOCK, S. J. 1983. Clinical trials, A Practical Approach, Chichester, Wiley.

3. KARPATKIN, M., PORGES, R. F. & KARPATKIN, S. 1981. Platelet counts in infants of women with autoimmune thrombocytopenia: effects of steroid administration to the mother. N Engl J Med, 305, 936-9.

4. FISHER, R. A. 1925. Statistical Methods for Research Workers, Edinburgh, Oliver and Boyd.

5. FISHER, R. A. On a Distribution Yielding the Error Functions of Several Well Known Statistics. . International Congress of Mathematics, 1924 Toronto. CP36 In: BENNET, J. H., ed, Collected Papers of RA Fisher, Vol 1 The University of Adelaide, 493-502.

6. SENN, S. J. 2013. Seven myths of randomisation in clinical trials. Statistics in Medicine, 32, 1439-50.

7 SIEGFRIED, S., SENN, S. & HOTHORN, T. 2022. On the relevance of prognostic information for clinical trials: A theoretical quantification. Biom J.

Stephen Senn

Statistical Consultant

1 年

I contacted Rosemary Bailey, who, I was not surprised to discover, had thought about this topic already. Here is her reply. 1/2)"Here is a comment on your first point. If you look back at the early discussions between R. A. Fisher and Frank Yates you will find that they both knew that, in order for randomization to be valid in the sense that, when averaged over all possible outcomes of the randomization, the expected mean squares for treatments and for error are equal when there are no treatment differences, the randomization needs to have the property that the probability that any given ordered pair of plots gets any particular ordered pair of treatments takes one value if the treatments are the same and the other value if they are different.?Moreover, these two probabilities are known, and? depend only on the relation between the two plots within the experimental structure."

回复
Matt Tenan, PhD ATC FACSM

Real-World Evidence Scientist

1 年

Thanks for this Stephen Senn. I've literally got some randomization inference code running as I type. Always good to hear your perspective.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了