Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

“…an essential part of understanding how many ties these RNGs produce is to understand how many ties one expects in 32-bit integer arithmetic.”

A sort of a birthday-problem paper for random generators by Markus Hofert on arXiv as to why they produce ties. As shown for instance in the R code (inspired by the paper):

sum(duplicated(runif(1e6)))


returning values around 100, which is indeed unexpected until one thinks a wee bit about it… With no change if moving to an alternative to the Mersenne twister generator. Indeed, assuming the R random generators produce integers with 2³² values, the expected number of ties is actually 116 for 10⁶ simulations. Moving to 2⁶⁴, the probability of a tie is negligible, around 10⁻⁸. A side remark of further inerest in the paper is that, due to a different effective gap between 0 and the smallest positive normal number, of order 10⁻²⁵⁴ and between 1 and the smallest normal number greater than 1, of order 10⁻¹⁶, “the grid of representable double numbers is not equidistant”. Justifying the need for special functions such as expm1 and log1p, corresponding to more accurate derivations of exp(x)-1 and log(1+x).