Randomization needs numbers
The histogram shows how the sample mean (red solid line) and sample confidence intervals (red dotted line) do not contain the population mean (blue solid line). This can happen when you randomize and trust on randomization to save your day.

Randomization needs numbers

Truth be told, the original version of this post looked different. But then, I messed-up changing text and code snippets, LinkedIN autosave did its magic and I lost 80% of my original article. To some, this is Murphy's law. To me, this is just me messing up. No randomness, no chance. Just good old me.

This post is not about what randomness is or is not. It is just a show and tell on the limitations of randomization, and why randomization is actually a big numbers game. Over the course of my PhD I (co-)authored about seven publications on the quality of Randomized Controlled Trials in the field of cancer. After that, I taught experimental design for 5+ years to 75+ PhDs in the animal science industry. Whereas randomization is the panacea in the medical field, the animal sciences have long past augmented randomization through the art of blocking (which is called stratified randomization in the medical world). The reason being that trials are often small in size due to farm limitations. Randomization, after all, is a big numbers game.

Today, I want to show you via simulation in R what happens if you cannot get the numbers needed to make randomization work, and why it is perhaps best to expect that randomization will not solve all your problems. Because, after all, randomization is put in place to equally spread known and unknown confounders across the control and treatment group, and nobody knows what he/she does not know.

Or, to quote William Briggs in his book "Uncertainty":

"Randomness is not a cause. Neither is chance. It is?always?a mistake to say things like “explainable by chance”, “random change”, “the differences are random”, “unlikely to be due to chance”, “due to chance”, “sampling error”, and so forth. Mutations in biology are said to be “random”; quantum events are called “random”; variables are “random”. An entire theory in statistics is built around the erroneous idea that chance is a cause. This theory has resulted in much grief, as we shall see.


First of, I will show that a simple is an inherent deviation from the 'true mean' or the 'population mean'. Hence, all the work conducted in those expensive RCTs have limited external validity, and repetition is needed. This is why meta-analyses are so important.

require(tidyverse
set.seed(11223)
x<-rnorm(1000,100,10)
density(x)
plot(density(x),lwd=3)
head(x)
df=data.frame(x)
x2<-df%>%sample_frac(0.05)
par(new=TRUE)
lines(density(x2$x), add=T, col="red", lty=2)
x3<-df%>%sample_frac(0.10)
par(new=TRUE)
lines(density(x3$x), add=T, col="blue", lty=2)
x4<-df%>%sample_frac(0.10)
par(new=TRUE)
lines(density(x4$x), add=T, col="green", lty=2)
x5<-df%>%sample_frac(0.10)
par(new=TRUE)
lines(density(x5$x), add=T, col="orange", lty=2)        
No alt text provided for this image
No alt text provided for this image

It is not hard to see that each sample, which I extracted as a 10% fraction from the original observations made, only mimics the population at best and can seriously deviate from it. This not bad. In fact, it should be expected.

Most of you know the sample-from-distribution trick, but I am not sure how many know the population-from-sample trick. Also called bootstrapping. What bootstrapping does is resample the sample, via replacement, to approximate the population from the sample should hypothetically come. Let me show you.

my.bootstrap <- numeric(10
for(i in 1:length(my.bootstrap)) {
? my.bootstrap[i] <- mean(sample(x2$x, replace=TRUE))
}
quantile(my.bootstrap,0.025)
quantile(my.bootstrap,0.975)
hist(x, freq=FALSE, ylim=c(0,0.06), breaks=48, col="white")
abline(v=mean(x), lty=1, lwd=2, col="blue")
abline(v=mean(my.bootstrap), lty=1, lwd=2, col="red")
abline(v=quantile(my.bootstrap,0.975), lty=2, lwd=1, col="red")
abline(v=quantile(my.bootstrap,0.025), lty=2, lwd=1, col="red")        
No alt text provided for this image

It is not perfect, but the bootstrapped confidence limits (red dotted line) contains the theoretical population mean (1000), even though the simulated population mean is not exactly 1000. Hence, one could say that a small sample, say 100 observations large, is enough to approximate the unknown population via bootstrapping.

However, I already mentioned that drawing sample (and thus randomization) is a big numbers game, and the example above drew a 100-observations sample and repeated that process 5000 times. What will happen if I make all this sampling a lot smaller? Say, a 5% fraction from the population (50 observations) and 10 resamples? This means, I will redo the trial ten times! Not many get to even redo a trial once.

my.bootstrap <- numeric(10
for(i in 1:length(my.bootstrap)) {
? my.bootstrap[i] <- mean(sample(x2$x, replace=TRUE))
}
quantile(my.bootstrap,0.025)
quantile(my.bootstrap,0.975)
hist(x, freq=FALSE, ylim=c(0,0.06), breaks=48, col="white")
abline(v=mean(x), lty=1, lwd=2, col="blue")
abline(v=mean(my.bootstrap), lty=1, lwd=2, col="red")
abline(v=quantile(my.bootstrap,0.975), lty=2, lwd=1, col="red")
abline(v=quantile(my.bootstrap,0.025), lty=2, lwd=1, col="red")        
No alt text provided for this image

This time, the bootstrapped confidence intervals do not even contain the population mean, and the bootstrapped mean is quite off. This was to be expected though, as sampling is a big numbers game. For instance, to get the blue line to hit 1000, we will probably need to create a 10k or 100k big database.

Now, lets move from a one-sample example to a two-sample example. A difference between groups.

set.seed(11223)
sample1 <- data.frame(sample = 'first sample', values = rnorm(1000,56,5))
sample2 <- data.frame(sample = 'second sample', values = rnorm(1000,50,5))
samples <- rbind(sample1,sample2)

g1<-ggplot(samples,
? ? ? ?aes(x = values, col = sample, fill = sample, group = sample)) +?
? geom_density(alpha=0.2) + theme_bw() + theme(legend.position = "none")

samples2<-as.data.frame(cbind(sample1$values,sample2$values, sample1$values-sample2$values))

g2<-ggplot(samples2,
? ? ? ? ? ?aes(x=V3)) +?
? geom_density() +
? geom_vline(xintercept=mean(samples2$V3), col="red", lty=2)+
? theme_bw()

require(viridis)
g3<-ggplot(samples2,
? ? ? ?aes(x = V1, y=V2, col=V3)) +?
? geom_point(size=5, alpha=0.8) +
? scale_color_gradient2(low = "red", mid = "darkblue", high = "red",?
? ? ? ? ? ? ? ? ? ? ? ? midpoint = mean(samples2$V3), guide = FALSE)+
? theme(legend.position = "none")+
? theme_bw()

require(grid)
require(gridExtra)
grid.arrange(g1,g2,g3,nrow=3))        

What I am asking R to do is create two equally sizes groups, both coming from a Normal distribution, but having no connection whatsoever. So, comparing the groups will show a mean difference of around six, but this is not because of anything whatsoever. The graphs below show the density of both groups, the density of the difference between the groups, and a scatterplot showing the differences. The cloud-shape of the differences reveals that that there is no underlying mechanism, except 'randomness'.

I am sure William Briggs would argue that randomness only exists in simulations, and I dare not disagree.

No alt text provided for this image

I will now do the same for this two-sample example as what I did for the one-sample example - bootstrapping. Let me start small.

require(simpleboot
require(boot)
bootstrapped <- two.boot(samples2$V1,?
? ? ? ? ? ? ? ? ? ? ? ? ?samples2$V2,?
? ? ? ? ? ? ? ? ? ? ? ? ?mean,?
? ? ? ? ? ? ? ? ? ? ? ? ?R=10,
? ? ? ? ? ? ? ? ? ? ? ? ?student = TRUE,
? ? ? ? ? ? ? ? ? ? ? ? ?M=50)
hist(bootstrapped)
abline(v=bootstrapped$t0[1], lty=1, lwd=2, col="blue")
abline(v=mean(bootstrapped$t[,1]), lty=1, lwd=2, col="red")
abline(v=boot::boot.ci(bootstrapped, type="stud")$student[4:5], lty=2, lwd=1, col="red"))        
No alt text provided for this image

Let me finish bigger.

bootstrapped <- two.boot(samples2$V1,
? ? ? ? ? ? ? ? ? ? ? ? ?samples2$V2,?
? ? ? ? ? ? ? ? ? ? ? ? ?mean,?
? ? ? ? ? ? ? ? ? ? ? ? ?R=1000,
? ? ? ? ? ? ? ? ? ? ? ? ?student = TRUE,
? ? ? ? ? ? ? ? ? ? ? ? ?M=1000)
hist(bootstrapped)
abline(v=bootstrapped$t0[1], lty=1, lwd=2, col="blue")
abline(v=mean(bootstrapped$t[,1]), lty=1, lwd=2, col="red")
abline(v=boot::boot.ci(bootstrapped, type="stud")$student[4:5], lty=2, lwd=1, col="red")?        
No alt text provided for this image

Don't get too optimistic because the bootstrapped mean is very close to the population mean. The bootstrapped confidence intervals are quite wide, which means that any trial you run could deviate that far from the population mean. Nobody will be able to replicate a trial 1000 times, or find 1000 publications of trials that are extremely similar. The more data you get, the more variance you often attain.

Now, lets make it even more interesting and move towards regression and build a linear model with no actual relationship. Then, I will add a simple main effect, and extend by adding multiple interaction effects. The goal here is to see if bootstrapping picks up the population effects from the sample when the model does, or does not, contain all necessary effects.

require(car)
set.seed(11223)

n <- 1000
x <- rnorm(n)
y <- x + rnorm(n)

population.data <- as.data.frame(cbind(x, y))

m1 <- lm(y ~ x, population.data)
betahat.boot <- Boot(m1, R=199) 
summary(betahat.boot)? # default summary
confint(betahat.boot)
hist(betahat.boot)        
No alt text provided for this image
n <- 100
x <- rnorm(n)
y <- 2.5*x + rnorm(n)
population.data <- as.data.frame(cbind(x, y))
m1 <- lm(y ~ x, population.data)
betahat.boot <- Boot(m1, R=199) # 199 bootstrap samples--too small to be useful
hist(betahat.boot)0        
No alt text provided for this image
require(jtools)
require(interactions)
n <- 1000
a <- rnorm(n)
b <- rnorm(n)
c <- rnorm(n)
d <- rnorm(n)
e <- rnorm(n)
y <- 2.5*a +?
? 1.3*b+
? 0.7*c+
? 2.1*d+
? 1.9*e+
? -0.5*a*b+
? -0.9*a*c+
? 1.1*c*d+
? 0.8*a*d*c+
? rnorm(n)
population.data <- as.data.frame(cbind(a,b,c,d,e,y))
m0 <- lm(y ~ a+b+c+d+e+
? ? ? ? ? ?a:b+
? ? ? ? ? ?a:c+
? ? ? ? ? ?a:d+
? ? ? ? ? ?a:e+
? ? ? ? ? ?b:c+
? ? ? ? ? ?b:d+
? ? ? ? ? ?b:e+
? ? ? ? ? ?c:d+
? ? ? ? ? ?c:e+
? ? ? ? ? ?d:e+
? ? ? ? ? ?a:c:b, population.data)
m1 <- lm(y ~ a+b+c, population.data)
i0<-interact_plot(m0, pred = a, modx = c)
i1<-interact_plot(m1, pred = a, modx = c)
grid.arrange(i0,i1, ncol=2)
plot_summs(m0, m1, scale = TRUE,plot.distributions = TRUE)        
No alt text provided for this image

The above graph shows the interaction plot of a model that does contain main effects and first-order interactions, to a model that only contains some of the main effects. This resembles a situation in which the analyst does not know that additional effects play a role. The example may be a bit extreme, but people do not know what they do not know, and this is where simulations becomes handy. The graph above clearly shows the second model (m1) not capturing the existing interaction.

Comparing the coefficients from both the m0 and m1 model shows what happens to the estimates of some of the main effects as well.

No alt text provided for this image

Lets see if bootstrapping can help us out. First, lets take again a look at the coefficients from the sparse model. The real issue with these results is that you have NO IDEA that you are missing at least two main effects (d and e), and a lot of interactions.

plot_summs(m1,omit.coefs =NULL,
? ? ? ? ? ?scale = TRUE,
? ? ? ? ? ?plot.distributions = TRUE,?
? ? ? ? ? ?model.names = c("m1"))
require(sjPlot)
require(sjmisc)
plot_model(m1, show.values = TRUE, value.offset = .3)+theme_bw()?        
No alt text provided for this image
No alt text provided for this image

So, if we bootstrap the data, will we then obtain estimates much closer to the population effects? Based on what I see below, I am not worried as the bootstrapped confidence intervals seem to contain the coefficients of the population.

betahat.m1.boot <- car::Boot(m1, R=1000)
hist(betahat.m1.boot)
summary(betahat.m1.boot)?

Number of bootstrap replications R = 1000
? ? ? ? ? ? original? ? bootBias? bootSE bootMed
(Intercept)? 0.10076 -0.00258513 0.11154 0.10175
a? ? ? ? ? ? 2.52591? 0.00011154 0.12425 2.52161  # 2.52 - 2.5 = 0.02
b? ? ? ? ? ? 1.23634 -0.00140071 0.10736 1.23401  # 1.24 - 1.3 = 0.07
c? ? ? ? ? ? 0.86689? 0.00414945 0.11381 0.87044? # 0.87 - 0.7 = 0.17 --> due to missing interaction        
No alt text provided for this image

But, what we forgot to really do, is to take a tiny single sample, from the original big database, and see how it behaves. Of course, the bootstrapped results will mimic the population, because I built 1000 datasets from those original observations. But what if I can only sample 50? Once?

x2<-population.data%>%sample_frac(0.05
m2 <- lm(y ~ a+b+c+d+e+
? ? ? ? ? ?a:b+
? ? ? ? ? ?a:c+
? ? ? ? ? ?a:d+
? ? ? ? ? ?a:e+
? ? ? ? ? ?b:c+
? ? ? ? ? ?b:d+
? ? ? ? ? ?b:e+
? ? ? ? ? ?c:d+
? ? ? ? ? ?c:e+
? ? ? ? ? ?d:e+
? ? ? ? ? ?a:c:b, x2)
plot_summs(m0, m2, scale = TRUE,model.names = c("m0", "m2"))        
No alt text provided for this image

And what if I did not include all the important variables I should have included?

m3 <- lm(y ~ a+b+c, x2
plot_summs(m0,m2,m3, scale = TRUE, model.names = c("m0", "m2","m3")))        


No alt text provided for this image

Can bootstrapping help me now? As you can see it cannot, and just like randomization, also bootstrapping needs a sufficient base to get started.

betahat.m2.boot <- car::Boot(m2, R=1000)
hist(betahat.m2.boot)
summary(betahat.m2.boot)

> summary(betahat.m2.boot)


Number of bootstrap replications R = 1000?
? ? ? ? ? ? ? original? ?bootBias? bootSE? ? bootMed
(Intercept)? 0.2194352 -0.0140810 0.26017? 0.2227485
a? ? ? ? ? ? 2.4550501 -0.0191784 0.27917? 2.4178270
b? ? ? ? ? ? 1.3595962? 0.0400222 0.27590? 1.3863846
c? ? ? ? ? ? 1.0580434? 0.0542109 0.33912? 1.1051985


betahat.m3.boot <- car::Boot(m3, R=1000)?
hist(betahat.m3.boot)
summary(betahat.m3.boot)

> summary(betahat.m3.boot)


Number of bootstrap replications R = 1000?
? ? ? ? ? ? original? ?bootBias? bootSE bootMed
(Intercept)? 0.90742 -0.0097986 0.52050 0.88945
a? ? ? ? ? ? 2.72682? 0.0229260 0.44768 2.74583
b? ? ? ? ? ? 1.45078? 0.0166067 0.45787 1.47188
c? ? ? ? ? ? 1.30921 -0.0068593 0.51522 1.31287
?        
No alt text provided for this image

The histograms coming from m3 show a nice enough curve to yield normally distributed bootstrapped confidence intervals, but the bias is too much. However, you will never know, because you did not include all relevant co-variates in your model.

Or to finish with Briggs:

Pick up a pencil and let it go mid air. What happened? It fell, because why? Because of gravity, we say, a cause with which we are all familiar. But the earth’s gravity isn’t the only force operating on the pencil; just the predominant one. We don’t consider the pencil falling to be “random” because we know the nature or essence of the cause and?deduce?the consequences. We need to speak more of what makes a causal versus probabilistic model, but a man standing in the middle of a field flipping a coin is thinking more probabilistically than the man dropping a pencil. Probabilities become substitutes for?knowledge of?causes, they do not become causes themselves."



UPDATE

Below the article, comments started to show that what I was showing resembled sampling deviation, and that sampling a covariance matrix does not change its inherent structure. That is true and perhaps I should have changed the title of this article to: "Randomization does not safeguard from incorrect model specification".

The following codes come from Pieter Koppelaar which I augmented to make my case again, even though his comments are extremely useful.

What I will show via the following code snippets and graphs is that a covariance matrix does maintain its structure via sampling, but that not including all the influencers in the model does limit coefficient estimation, regardless of sampling used. However, the less data you have, the worse it will be.

library(tidyverse
library(corrplot)
require(jtools)
require(interactions)
require(grid)
require(gridExtra)
set.seed(1)
variables <- 3
population <- 1e7
samplesize <- 1000
samplingsize <- 1000
z_std <- function(observed) {result <- (observed - mean(observed)) / sd(observed)}

# Unobservable (latent) variables U are matrix-multiplied with B twice to obtain Observable variables O. We consider this our population.
U <- matrix(data = rnorm(variables * population), ncol=variables)
B <- matrix(data = rnorm(variables^2), ncol=variables)
O <- U %*% B %*% B

Udf<-as.data.frame(U)
Udf%>%sample_frac(0.05)%>%
? ggplot(.)+
? geom_density(aes(x=V1))+
? geom_density(aes(x=V2))+
? geom_density(aes(x=V3))+
? theme_bw()
Udf%>%sample_frac(0.05)%>%pairs(.) # No relationship whatsoever)        
No alt text provided for this image
Odf<-as.data.frame(O
Odf%>%sample_frac(0.05)%>%
? ggplot(.)+
? geom_density(aes(x=V1))+
? geom_density(aes(x=V2))+
? geom_density(aes(x=V3))+
? theme_bw()
Odf%>%sample_frac(0.05)%>%pairs(.)))        
No alt text provided for this image
U <- as.tibble(U) %>% mutate_all(z_std)
O <- as.tibble(O) %>% mutate_all(z_std)
samplingU <- U %>% filter(NA)
samplingO <- O %>% filter(NA)
for (i in 1:samplingsize){
? sample <- sample(1:population, samplesize)
? samplingU[i,] <- U %>% slice(sample) %>% summarize_all(mean)
? samplingO[i,] <- O %>% slice(sample) %>% summarize_all(mean)
}
ggplot(samplingO)+
? geom_density(aes(x=V1))+
? geom_density(aes(x=V2))+
? geom_density(aes(x=V3))+
? theme_bw()
cor(samplingO)
> cor(samplingO)
? ? ? ? ? ?V1? ? ? ? ?V2? ? ? ? ?V3
V1? 1.0000000 -0.5526532 -0.5681778
V2 -0.5526532? 1.0000000? 0.5949642
V3 -0.5681778? 0.5949642? 1.0000000

cor(O)
> cor(O)
? ? ? ? ? ?V1? ? ? ? ?V2? ? ? ? ?V3
V1? 1.0000000 -0.5625197 -0.5802865
V2 -0.5625197? 1.0000000? 0.6126717
V3 -0.5802865? 0.6126717? 1.0000000

cor(Odf)
> cor(Odf)
? ? ? ? ? ?V1? ? ? ? ?V2? ? ? ? ?V3
V1? 1.0000000 -0.5625197 -0.5802865
V2 -0.5625197? 1.0000000? 0.6126717
V3 -0.5802865? 0.6126717? 1.0000000

Odf%>%
? sample_frac(0.00005)%>%
? cor(.)
? ? ? ? ? ?V1? ? ? ? ?V2? ? ? ? ?V3
V1? 1.0000000 -0.5280985 -0.5912621
V2 -0.5280985? 1.0000000? 0.5840342
V3 -0.5912621? 0.5840342? 1.0000000        
No alt text provided for this image
attach(Odf
Odf$Y<-2*V1-3*V2+0.8*V3+
? 1.9*V1*V2 +
? -7*V2*V3+
? 3.4*V1*V3)
Odf%>
? sample_frac(0.0005)%>%
ggplot(.,?
? ? ? ?aes(x=V1, y=Y, col=V3, size=V2))+
? geom_point(alpha=0.5)+theme_bw()        
No alt text provided for this image
m0 <- lm(Y ~
? ? ? ? ? ?V1 +?
? ? ? ? ? ?V2 +?
? ? ? ? ? ?V3 +?
? ? ? ? ? ?V1:V2 +?
? ? ? ? ? ?V2:V3 +?
? ? ? ? ? ?V1:V3,?
? ? ? ? ?data=Odf)
m1 <- lm(Y ~?
? ? ? ? ? ?V1 +?
? ? ? ? ? ?V3 +??
? ? ? ? ? ?V1:V3,?
? ? ? ? ?data=Odf)
i0<-interactions::interact_plot(m0, pred = V1, modx = V3)
i1<-interactions::interact_plot(m1, pred = V1, modx = V3)
grid.arrange(i0,i1, ncol=2)
plot_summs(m0, m1, scale = TRUE,model.names = c("m0", "m1"))?        
No alt text provided for this image
No alt text provided for this image
m0_s <-Odf%>
? sample_frac(0.0005)%>%
? lm(Y ~?
? ? ? ? ? ?V1 +?
? ? ? ? ? ?V2 +?
? ? ? ? ? ?V3 +?
? ? ? ? ? ?V1:V2 +?
? ? ? ? ? ?V2:V3 +?
? ? ? ? ? ?V1:V3,?
? ? ? ? ?data=.)
m1_s <- Odf%>%
? sample_frac(0.0005)%>%
? lm(Y ~?
? ? ? ? ? ?V1 +?
? ? ? ? ? ?V3 +??
? ? ? ? ? ?V1:V3,?
? ? ? ? ?data=.)
i0_s<-interactions::interact_plot(m0_s, pred = V1, modx = V3)
i1_s<-interactions::interact_plot(m1_s, pred = V1, modx = V3)
grid.arrange(i0,i0_s,
? ? ? ? ? ? ?i1,i1_s,
? ? ? ? ? ? ?ncol=2,nrow=2)%        
No alt text provided for this image
No alt text provided for this image



Pieter Koppelaar

Architect Data & Analytics

3 年

Ha Marc, Kan je misschien een uitleg toevoegen waarom bootstrapping een relevant onderwerp is? De relevantie wordt op dit moment nergens duidelijk. Hetzelfde geldt voor regressie. Wat is de relevantie hiervan bij clinical trials? Meestal gaat het hier toch om een test en controlegroep design waar men ge?nteresseerd is in het vergelijken van proporties of gemiddelden. Wat gaat hier mis volgens jou? Als we bij het voorbeeld van de vaccin onderzoeken blijven: Hoe kan het zijn dat redelijk consistent grote verschillen in proporties gevonden (symptomatisch ernstig ziek) tussen twee dubbelblinde groepen. En als de randomisation niet het effect heeft zoals men verwacht vanwege eventuele niet te observeren verschillen tussen groepen, waarom vinden we dan deze effecten nooit omgekeerd? Misschien zou je een uitleg kunnen maken van problemen die jij ziet op basis van een dergelijk eenvoudig design zonder resampling en zonder lineaire regressie op niet-lineaire effecten.

回复

要查看或添加评论,请登录

Dr. Marc Jacobs的更多文章

社区洞察

其他会员也浏览了