The Most Dangerous Equation, or Why Small is Not Beautiful!

[This article was first published on R-Bloggers – Learning Machines, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Over one billion dollars have been spent in the US to split up big schools into smaller ones because small schools regularly show up in rankings as top performers.

In this post, I will show you why that money was wasted because of a widespread (but not so well known) statistical artifact, so read on!

Why do small schools perform better? Many are quick to point out that an intimate, family-like atmosphere helps facilitate learning and smaller units can cater more to the special needs of their pupils. But is this really the case?

Let us conduct a little experiment where we distribute randomly performing pupils on one thousand schools, where half of them are big and the other half small. To be clear: we distribute those pupils by chance alone so that every school should have the same shot at getting top, average-, and low performers!

As a proxy for performance let us just take the intelligence quotient (IQ) which is defined to be normally distributed with a mean of 100 and a standard deviation of 15.

Let us now have a look at the fifty best performing schools:

n <- 1000 # no of schools
sm <- 50 # no of pupils small school
bg <- 1000 # no of pupils big school

set.seed(12345)
small <- matrix(rnorm(n/2 * sm, mean = 100, sd = 15), nrow = n/2, ncol = sm)
big <- matrix(rnorm(n/2 * bg, mean = 100, sd = 15), nrow = n/2, ncol = bg)

df_small <- data.frame(size = "small", performance = rowMeans(small))
df_big <- data.frame(size = "big", performance = rowMeans(big))

df <- rbind(df_small, df_big)

df_order <- df[order(df$performance, decreasing = TRUE), ]
df_order |> head(50)
##      size performance
## 296 small    106.3838
## 464 small    105.9089
## 128 small    105.5734
## 146 small    105.2287
## 36  small    104.9394
## 479 small    104.7963
## 406 small    104.7407
## 126 small    104.7316
## 386 small    104.6500
## 15  small    104.6092
## 183 small    104.5761
## 492 small    104.5029
## 106 small    104.5003
## 330 small    104.3905
## 456 small    104.2457
## 84  small    104.2373
## 474 small    104.1203
## 89  small    103.9073
## 315 small    103.7264
## 108 small    103.5928
## 19  small    103.5878
## 268 small    103.5433
## 435 small    103.4099
## 481 small    103.2921
## 70  small    103.2447
## 110 small    103.2089
## 96  small    103.1939
## 497 small    103.1700
## 103 small    103.1609
## 262 small    103.1533
## 33  small    103.1376
## 293 small    103.1008
## 252 small    103.0873
## 240 small    103.0657
## 170 small    103.0553
## 220 small    103.0485
## 185 small    103.0196
## 195 small    103.0042
## 98  small    102.9625
## 294 small    102.9349
## 51  small    102.9339
## 317 small    102.9308
## 403 small    102.9258
## 202 small    102.9255
## 463 small    102.9072
## 321 small    102.8631
## 124 small    102.8468
## 380 small    102.8341
## 273 small    102.8147
## 217 small    102.8005

In fact, all of them are small! How is that possible?

You are in for another surprise when you also look at the fifty worst performers:

df_order |> tail(50)
##      size performance
## 221 small    97.43354
## 271 small    97.43122
## 420 small    97.42636
## 77  small    97.40313
## 141 small    97.38554
## 192 small    97.38214
## 45  small    97.36123
## 331 small    97.35636
## 133 small    97.15977
## 400 small    97.12790
## 350 small    97.09748
## 161 small    96.97677
## 395 small    96.97156
## 17  small    96.95732
## 50  small    96.94013
## 353 small    96.77378
## 376 small    96.63066
## 140 small    96.60068
## 80  small    96.59475
## 115 small    96.56996
## 362 small    96.52821
## 16  small    96.35869
## 344 small    96.29919
## 290 small    96.23309
## 41  small    96.21083
## 246 small    96.13112
## 345 small    96.06559
## 104 small    96.03273
## 425 small    96.02648
## 32  small    96.01366
## 7   small    95.99434
## 364 small    95.99399
## 397 small    95.95282
## 374 small    95.92229
## 18  small    95.80126
## 184 small    95.73809
## 127 small    95.65754
## 270 small    95.60595
## 356 small    95.39492
## 433 small    95.33532
## 475 small    95.30683
## 66  small    95.30076
## 445 small    94.94566
## 61  small    94.88395
## 500 small    94.85970
## 442 small    94.85128
## 6   small    94.56465
## 347 small    94.43273
## 261 small    94.23249
## 171 small    93.82268

Again, all of them are small! The following plot reveals the overall pattern:

(df_order$performance - 100) |> barplot(main = "Sorted relative performance of schools", col = ifelse(df_order$size == "small", "blue", "red"), border = NA)
abline(h = 0)
legend("bottomleft", legend = c("Small schools", "Big schools"), fill = c("blue", "red"), bty = "n")

The statistical effect at work here is what my colleague Professor Howard Wainer from the University of Pennsylvania coined “The Most Dangerous Equation” in an article in the renowned American Scientist.

To understand this let us first look at some statistical measures:

summary(df_small$performance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   93.82   98.59  100.06  100.01  101.42  106.38

summary(df_big$performance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   98.58   99.71  100.02   99.99  100.32  101.19

We can see that, while both have the same mean performance, the performance variation is bigger for small schools than for big ones where the extreme cases are averaged out. De Moivre’s equation gives the exact relation in connection to the sample size, i.e. the number of pupils in this case:

    \[\sigma_{\bar x}=\sigma/\sqrt{n}\]

where \sigma_{\bar x} is the standard error of the mean, \sigma is the standard deviation of the sample and n is the size of the sample.

Let us calculate those numbers for our little example and compare them to the actual standard deviations of the school performance:

sd(df_small$performance)
## [1] 2.144287

15 / sqrt(sm)
## [1] 2.12132

sd(df_big$performance)        
## [1] 0.4677121

15 / sqrt(bg)
## [1] 0.4743416

We can see that those numbers fit very well.

Basically, de Moivre’s equation tells us that the smaller the sample size (= the number of pupils per school) the bigger the variation (= standard error/standard deviation) which results in the effect that small schools inhabit both extremes of the performance rankings. When you only look at the top performers you will falsely conclude that you have to split up big schools!

In fact, later studies (which took this effect into account) even found that larger schools are better performing after all because they are able to offer a wider range of classes with teachers who can focus on fewer subjects. The “de Moivre”-effect was hiding this and turned it on its head!

Other examples of this widespread effect include:

  • Disease rates in small and big cities, e.g. for cancer or COVID-19 infections. Small cities simultaneously lead and lag statistics compared to large cities. Sometimes even one single case can make all the difference.
  • Traffic safety rates: same here, small cities are at each end of the spectrum.
  • Quality of hospitals: regularly small houses end up being at the top and at the bottom.
  • Mutual fund performance: there are often large swings in the performance of small funds which lets them show up in rankings on both extremes.
  • …and the list goes on and on.

Of course, commentators are always fast to find reasons for each extreme, e.g. in the case of cancer rates: either “clean living of the rural lifestyle — no air pollution, no water pollution, access to fresh food without additives”, and so on vs. “poverty of the rural lifestyle — no access to good medical care, a high-fat diet, and too much alcohol and tobacco”, where in fact both results are mainly due to the statistical artifact explained by de Moivre’s equation!

This equation is so dangerous exactly because people don’t know it yet its effects are so far-reaching.

Another example of what can be called “Only looking at the winners-fallacy” can be found here: How to be Successful! The Role of Risk-taking: A Simulation Study. For another interesting fallacy due to sampling see: Collider Bias, or Are Hot Babes Dim and Eggheads Ugly?.

Do you have other examples where the “de Moivre”-effect creeps up? Please share your thoughts in the comments.

To leave a comment for the author, please follow the link and comment on their blog: R-Bloggers – Learning Machines.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)