# Failed Randomization In A Randomized Trial?

November 4, 2013
By

[This article was first published on Statistical Reflections of a Medical Doctor » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We will continue the saga of the three-arm clinical trial that is giving the editors of the prestigious journal The Spleen a run for their money. While the polls are gathering digital dust, let’s see if we can direct this discussion to a more quantitative path. To do so, we will ask (and answer) the question from a frequentist point; according to this approach we raise the red flag if the event under examination is rare assuming a hypothesis about the state of the world (null hypothesis $H_0$) is true.

In this case the null hypothesis is that the investigators at Grand Fenwick Memorial did run a randomized control trial under a simple randomization scheme, in which each patient had equal chance to be given one of the three interventions: GML, SL or MBL. To calculate the rarity of the observed pattern, we need to define an appropriate event and then figure out its rarity (“long-term frequency”) in many repetitions of the randomization allocation scheme used in the trial.

Considering the number of patients in the three arms of the trial, 105/70/65, v.s. the expectation of 80/80/80  it would appear that the most influential factor in determining the “rarity” of the observed pattern is the difference in size between the largest and the smallest arm in the trial.  On the other hand a difference of 5 between the second largest and the smallest arms would not appear to be worthy of consideration, at least as a first approximation. To determine the long term frequency of the event in a trial with 240 patients, we will use the R language to carry out a large number of these hypothetical allocations and figure out the number of those in which the difference in size between the largest and smallest arms exceeds 40:

``` event<-c(105,70,65)  ## observed pattern
## computes the difference in size between arms
frequentist2<-function(x,l1=40) {
x<-sort(x,decreasing=TRUE)
I((x[1]-x[3])>=l1)
}
set.seed(4567) ## for reproducibility
## hypothetical trials
g<-t(rmultinom(500000,sum(prob),c(1,1,1)))
## flags the repetitions of the studies in which a rare
## event was observed and calculates the frequency (in %)
res3<-apply(g,1,frequentist2);mean(res3)*100
```

This number comes out to be 0.5%. In other words, 1 out of 200 randomized trials that assign patients with equal probability to three arms will generate an imbalance of this magnitude.
But is this the answer we are trying to obtain? In other words the situation that the editors of The Spleen face is to evaluate the likelihood that patients were not randomly assigned to the three interventions. This evaluation is only indirectly related to the rarity of observing a large size difference in the arms of a trial that did not cheat. By not considering directly the hypothesis of foul-play (unequal allocation probabilities in the three arms), both the investigators and their accusers will find themselves in endless quarrel about the interpretation of rarity as a chance finding v.s. an improbable one indicative of fraud.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.