# Happy PI day

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have never done a post for PI day. This year I want to do so.**Wiekvoet**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So, we all know the simple estimation of PI based on random numbers. The code used here is chosen for speed in R.

pi2d <- function(N=1000) {

4*sum(rowSums(matrix(runif(N*2)^2,ncol=2))<1)/N

}

What irritates me, is the low efficiency of this estimate. What do you get for 10 000 simulations? Probably, but not even certain, the first two digits.

summary(sapply(1:1000,function(x) pi2d(10000)))

Min. 1st Qu. Median Mean 3rd Qu. Max.

3.080 3.130 3.141 3.141 3.153 3.189

In the past years I have been thinking how to get that more efficient, but that is not obvious. For instance, it is possible to use the three dimensional equivalent, a ball:

pi3d <- function(N=1000) {

6*sum(rowSums(matrix(runif(N*3)^2,ncol=3))<1)/N

}

}

summary(sapply(1:1000,function(x) pi3d(10000)))

Min. 1st Qu. Median Mean 3rd Qu. Max.

Min. 1st Qu. Median Mean 3rd Qu. Max.

3.052 3.121 3.140 3.142 3.161 3.243

At some point I thought this is due to the limited information in such a calculation, it is binomial and one simulation gives one bit of information. And it could be more simple. If the first random number is known, say

*y*, then all second random numbers over sqrt(1-

*y*

^{2}) give distance larger than 1, while the remainder gives distance less than 1. Thus should pi be equal to the mean of random numbers transformed like sqrt(1-

*y*

^{2})?

pin <- function(N=1000) {

4*sum(sqrt(1-runif(N)^2))/N

}

summary(sapply(1:1000,function(x) pin(10000)))

Min. 1st Qu. Median Mean 3rd Qu. Max.

3.113 3.135 3.142 3.141 3.147 3.171

These numbers are closer, but there are additional calculations. Hence the number of simulations should be adapted to reflect the work done. Luckily we have microbenchmark() to calibrate this. After a bit of experimenting, these are the number of simulations giving roughly equivalent computation times.

microbenchmark(pi2d(10000),pi3d(6666),pin(22000))

Unit: milliseconds

expr min lq mean median uq max neval

pi2d(10000) 2.419106 2.436333 2.630764 2.458325 2.500477 5.253860 100

pi3d(6666) 2.361928 2.382820 2.557150 2.418006 2.467855 4.970898 100

pin(22000) 2.448429 2.468954 2.555823 2.485815 2.517703 5.023678 100

summary(sapply(1:100,function(x) pi2d(10000)))

Min. 1st Qu. Median Mean 3rd Qu. Max.

3.111 3.132 3.141 3.142 3.152 3.175

summary(sapply(1:100,function(x) pi3d(6666)))

Min. 1st Qu. Median Mean 3rd Qu. Max.

3.046 3.116 3.142 3.139 3.165 3.230

summary(sapply(1:100,function(x) pin(22000)))

Min. 1st Qu. Median Mean 3rd Qu. Max.

3.130 3.137 3.141 3.142 3.146 3.161

It could obviously be thought that the random numbers are not needed. An integration can be done. But that is much less fun.Min. 1st Qu. Median Mean 3rd Qu. Max.

3.111 3.132 3.141 3.142 3.152 3.175

summary(sapply(1:100,function(x) pi3d(6666)))

Min. 1st Qu. Median Mean 3rd Qu. Max.

3.046 3.116 3.142 3.139 3.165 3.230

summary(sapply(1:100,function(x) pin(22000)))

Min. 1st Qu. Median Mean 3rd Qu. Max.

3.130 3.137 3.141 3.142 3.146 3.161

integrate(function(x) 4*sqrt(1-x^2),0,1)

3.141593 with absolute error < 0.00016

To

**leave a comment**for the author, please follow the link and comment on their blog:**Wiekvoet**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.