more e’s [and R’s]

[This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

estimeAlex Thiéry suggested debiasing the biased estimate of e by Rhee and Glynn truncated series method, so I tried the method to see how much of an improvement (if any!) this would bring. I first attempted to naïvely implement the raw formula of Rhee and Glynn

\hat{\mathfrak{e}} = \sum_{n=1}^N \{\hat{e}_{n+1}-\hat{e}_n\}\big/\mathbb{P}(N\ge n)

with a (large) Poisson distribution on the stopping rule N, but this took ages. I then realised that the index n did not have to be absolute, i.e. to start at n=1 and proceed snailwise one integer at a time: the formula remains equally valid after a change of time, i.e. n=can start at an arbitrary value and proceeds by steps of arbitrary size, which obviously speeds things up!

while (i<=imax){ 
 for (j in 1:m){ 
  if (space>0){ space=1/space}
     else{ space=3}

For instance, the running time of the above code with n=10⁶ uniforms in use is

> system.time(mean(rheest(runif(n))))
utilisateur     système      écoulé 
      6.035       0.000       6.031 

If I leave aside the issue of calibrating the above in terms of the Poisson parameter λ and of the step size [more below!], how does this compare with Forsythe’s method? Very (very) well if following last week pedestrian implementation:

> system.time(mean(forsythe(runif(n))))
utilisateur     système      écoulé 
    146.115       0.000     146.044 

But then I quickly (not!) realised that proceeding sequentially through the sequence of uniform simulations was not necessary to produce independent realisations of Forsythe’s estimate: this R code takes batches of uniforms that are far enough to see a sign change in the differences and takes the length of the first decreasing sequence for each batch. While this certainly is suboptimal in exploiting the n Uniform variates, it avoids scrawling through the sequence one value at a time and is clearly way faster:


> system.time(mean(fastythe()))
utilisateur     système      écoulé 
      0.904       0.000       0.903 

while producing much more accurate estimates than the debiased version if not the original one, as shown by the boxplot at the top (where RG stands for Rhee and Glynn, F for Forsythe and W for Wuber who proposed the original [biased] Monte Carlo estimate of e⁻¹)! Just to make things clear if necessary, all comparisons are run with the same batch of 10⁶ uniform simulations for each mean value entering the boxplots.

One thing that bothers me about the debiased solution is that it is highly sensitive to calibration. For instance, when switching λ and the step size between 10 and 100, I got the following variation over 100 replicates:

rheeglynn1where the last box corresponds to the RG box on the first graph. This is most annoying in that using the whole sequence of uniforms at once and the resulting (biased) estimator leads to the most precise and fastest estimator of the whole series (W label in the first graph)!

> system.time(1/mean((n-1)*diff(sort(use))>1))
utilisateur     système      écoulé 
      0.165       0.000       0.163 

Filed under: Kids, pictures, R, Statistics Tagged: cross validated, debiasing, George Forsythe, Monte Carlo Statistical Methods, R, Russian roulette, simulation, unbiased estimation

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)