Here you will find daily news and tutorials about R, contributed by over 750 bloggers.
There are many ways to follow us - By e-mail:On Facebook: If you are an R blogger yourself you are invited to add your own R content feed to this site (Non-English R bloggers should add themselves- here)

The birthday problem (i.e. looking at the distribution of the birthdates in a group of n persons, assuming [wrongly] a uniform distribution of the calendar dates of those birthdates) is always a source of puzzlement [for me]! For instance, here is a recent post on Cross Validated:

I have 360 friends on facebook, and, as expected, the distribution of their birthdays is not uniform at all. I have one day with that has 9 friends with the same birthday. So, given that some days are more likely for a birthday, I’m assuming the number of 23 is an upperbound.

The figure 9 sounded unlikely, so I ran the following computation:

extreme=rep(0,360)
for (t in 1:10^5){
i=max(diff((1:360)[!duplicated(sort(sample(1:365,360,rep=TRUE))))]))
extreme[i]=extreme[i]+1
}
extreme=extreme/10^5
barplot(extreme,xlim=c(0,30),names=1:360)

whose output shown on the above graph. (Actually, I must confess I first forgot the sort in the code, which led me to then believe that 9 was one of the most likely values and post it on Cross Validated! The error was eventually picked by one administrator. I should know better than trust my own R code!) According to this simulation, observing 9 or more people having the same birthdate has an approximate probability of 0.00032… Indeed, fairly unlikely!

> system.time(test(10^5)) #my code above
user system elapsed
26.230 0.028 26.411
> system.time(table(replicate(10^5, max(tabulate(sample(1:365,360,rep=TRUE))))))
user system elapsed
5.708 0.044 5.762