# CLT Standard Normal Generator

April 2, 2010
By

(This article was first published on stotastic » R, and kindly contributed to R-bloggers)

I’ve found this standard normal random number generator in a number of places, one of which being from one of Paul Wilmott’s books. The idea is that we can use the Central Limit Theorem (CLT) to easily generate values distributed according to a standard normal distribution by using the sum of 12 uniform random variables and subtracting 6. In Excel, the implementation looks like this:
 =RAND()+RAND()+RAND()+RAND()+RAND()+RAND()+RAND()+RAND()+RAND()+RAND()+RAND()+RAND()-6 
By doing a simple cut-and-paste, we can stick this formula in an Excel cell and go on with our merry way assuming we have generated values from a standard normal distribution. But what is really going on here, and how good does this generator work?

# The Idea

The idea behind this standard normal generator is simple and is based on the Central Limit Theorem. In a nut shell, if we define the following random variables.

$U_i : iid : sim U(0,1) \ S=displaystylesum_{i=1}^{m}{U_i}$

Then we can approximate the distribution of $S$ using the CLT.

$S : text{approximately} : sim N left(m cdot mathbb{E}[U_i], m cdot Var[U_i] right)$

Since we know that the mean and variance of $U_i$ is $frac{1}{2}$ and $12$ respectivly.

$S : text{approximately} : sim N left(frac{m}{2}, frac{m}{12}right)$

Finally, if we ‘standardize’ $S$ by subtracting the mean and dividing by its standard deviation we get a standard normal random variable.

$Z = frac{S-frac{m}{2}}{sqrt{frac{m}{12}}} \ text{where} : Z : text{approximately} : sim N left(0,1 right)$

So we have essentially taken the sum of uniform random variables and used them to approximate a standard normal random variable by applying the CLT. The important thing to keep in mind is that the more uniforms we use to do this, the better the approximation. You may be asking yourself why this looks nothing like the simple Excel formula I showed earlier. Well, something special happens when we use 12 uniforms; things start to simplify!

$Z = frac{S-frac{12}{2}}{sqrt{frac{12}{12}}} = S-6$

Voila! We have an easy to implement standard normal random number generator. We should still be a little concerned about the CLT approximation and we should probably ask ourselves if using only 12 uniform random variables is ‘good enough’.

# Testing

Now to the fun part! I’ve written the following function which implements the above method in R.

## function that uses the CLT to generate standard normals from uniform ## n is the number of standard normal random numbers to generate ## m is the number of uniforms to generate for using the CLT   CLT_normal <- function(n, m){ z <- rep(0,n) for(i in 1:n){ u <- runif(m,0,1) s <- sum(u) z[i] <- (s-m/2) / (m/12) } return(z) }

Using the generated values, we can perform a visual inspection using QQ normal plots for various values of m. I also generated results using m as 30 since 30 is often used as a rule-of-thumb for applying the CLT.

## test the normal generator using various values of m par(mfrow=c(2,2)) m <- 1 x <- CLT_normal(100000, m) qqnorm(x, main=paste("QQ normal m=", m)) qqline(x, col="red")   m <- 6 x <- CLT_normal(100000, m) qqnorm(x, main=paste("QQ normal m=", m)) qqline(x, col="red")   m <- 12 x <- CLT_normal(100000, m) qqnorm(x, main=paste("QQ normal m=", m)) qqline(x, col="red")   m <- 30 x <- CLT_normal(100000, m) qqnorm(x, main=paste("QQ normal m=", m)) qqline(x, col="red")

Based on this output, the generated values have lighter tails than a normal distribution, but using 12 uniforms seems to be ok if one was performing a ‘quick and dirty’ analysis in Excel. 30 uniforms obviously performs better, but things start to slow down considerably and it would probably be better to write a function using the Box-Muller method if better accuracy in the tails was needed.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...