RANDU: The case of the bad RNG

[This article was first published on R – Why?, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The German Federal Office for Information Security (BSI) has established
criteria for quality random number generator (rng):

  • A sequence of random numbers has a high probability of containing no identical consecutive elements.
  • A sequence of numbers which is indistinguishable from true random’ numbers (tested using statistical tests.
  • It should be impossible to calculate, or guess, from any given sub-sequence, any previous or future values in the sequence.
  • It should be impossible, for all practical purposes, for an attacker to calculate, or guess the values used in the random number algorithm.

Points 3 and 4 are crucial for many applications. Everytime you make a
phone call, contact to a wireless point, pay using your credit card random
numbers are used.

Designing a good random number generator is hard and as a general rule you should never try to. R comes with many good quality random generators. The default generator is the Mersenne-Twister. This rng has a huge period of 2^{19937}-1 (how many random numbers are generated before we have a repeat).

Linear congruential generators

A linear congruential generator (lcg) is a relatively simple rng (popular in the 60’s and 70’s). It has a simple form of

r_{i}=(ar_{i-1}+b) \textrm{ mod }m, \quad i=1, 2, \ldots, m

where $latexr_0$ is the initial number, known as the seed, and (a,b,m) are the multiplier, additive constant and modulo respectively. The parameters are all integers.

The modulo operation means that at most m different numbers can be generated
before the sequence must repeat – namely the integers 0,1,2, \ldots, m-1. The
actual number of generated numbers is h \leq m, called the period of
the generator.

The key to random number generators is in setting the parameters.

RANDU

RANDU was a lcg with parameters m=2^{31}, a=65539 and b=0. Unfortunately this is a spectacularly bad choice of
parameters. On noting that a=65,539=2^{16}+3, then

r_{i+1} = a r_i = 65539 \times r_i = (2^{16}+3)r_i \;.

So

r_{i+2} = a\;r_{i+1} = (2^{16}+3) \times r_{i+1} = (2^{16}+3)^2 r_i \;.

On expanding the square, we get

r_{i+2} = (2^{32}+6\times 2^{16} + 9)r_i = [6 (2^{16}+3)-9]r_i = 6 r_{i+1} - 9 r_i \;.

Note: all these calculations should be to the mod 2^{31}. So there is a large
correlation between the three points!

If compare randu to a standard rng (code in a gist)

Rplot1

It’s obvious that randu doesn’t produce good random numbers. Plotting  x_i, x_{i-1} and x_{i-2} in 3d

Rplot2

Generating the graphics

The code is all in a gist and can be run via

devtools::source_gist("https://gist.github.com/csgillespie/0ba4bbd8da0d1264b124")

You can then get the 3d plot via

scatterplot3d::scatterplot3d(randu[,1], randu[,2], randu[,3], angle=154) ## Interactive version threejs::scatterplot3js(randu[,1], randu[,2], randu[,3])


To leave a comment for the author, please follow the link and comment on their blog: R – Why?.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)