**Realizations in Biostatistics**, and kindly contributed to R-bloggers)

RKN over at Epidemiology blog had a small problem he solved using simulation:

I have been interested in the following question: if there are, let’s say, 5 genes involved in the liability of fx risk, and each gene has two alleles with one of them conferring a greater risk, what is the chance that an individual will have 0, 1, 2, 3, 4, or 5 risk alleles?

Obviously, the answer depends on the allelic frequency. I am too lazy to work out by algebraic probability, so I have conducted a simple simulation (using SAS) as follows:

There are 5 genes with allelic frequencies being 1%, 5%, 10%, 15% and 20%

Assuming that they are independent (zero linkage disequilibrium)

He then programmed a short simulation in SAS using 100,000 replicates. However, remembering the advice at this site, I wondered if SAS’s random number generator was up to the task. The period of this generator (a linear congruential generator) is 2^{32}-1, and the quality decreases after the production of the square root of this number of pseudorandom numbers. (Around 2^{16}, or 65536.) At 500,000 numbers, the quality of the random numbers should start to decrease.

I didn’t run the battery of tests on the numbers, but I did replicate the experiment using R, which uses the Mersenne twister as its default generator. I got the following as a result:

0 1 2 3 4

0.57481 0.34678 0.07177 0.00633 0.00031

This isn’t too far off of what RKN got. So maybe the sledgehammer wasn’t necessary here.

**leave a comment**for the author, please follow the link and comment on their blog:

**Realizations in Biostatistics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...