The Birthday Simulation

[This article was first published on Statistical Research » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Nothing novel or unique about this problem.  This just extends the problem to measure the probability to three or more people sharing the same birthday using simulation approaches.

For two people it’s fairly straight forward and with a group of about 22 people the probability that two people share the same birthday is about 0.5.  For groups approaching 50 there is an extremely high probability that two people share the same birthday

Prob2Birthday

 

When determining that three (or more) people have the same birthday the probability decreases fairly quickly compared to measuring only two people.  A fairly large group would be needed to find three people with the same birthday.

Birthday Plot

 

Here is some R code to determine these probabilities.


n.rep = 5000
theta.val = 75
doy = seq(from=1, to=365, by=1)
sim.mat = matrix(NA, nrow=theta.val, ncol=4)

getProb = function(n){
q = 1 - seq(0,n-1)/365
p = 1 - prod(q)
}

theta.list = seq(from=2, to=75, by=1)
p.graph = sapply(theta.list, getProb)
fifty.fifty = which(p.graph >.5)[1]
plot(p.graph, main="Probability Two People Have the Same Birthday", ylab='Probability', xlab="Number of People in Group")
lines(p.graph)
abline(h=.5, v=fifty.fifty)

 

## For matching multiple people

## Runs a little slow.  If I had more time I would find a more efficient way to write this.
for(i in 2:theta.val){
bday = replicate(n.rep, sample(doy, size=i, replace=T) )
bday[1,]
bday.table = apply(bday, 2, table)

sim.2 = ifelse( unlist( lapply(bday.table, max) ) >=2, 1, 0)
sim.3 = ifelse( unlist( lapply(bday.table, max) ) >=3, 1, 0)
sim.4 = ifelse( unlist( lapply(bday.table, max) ) >=4, 1, 0)

sim.mat[i,1] = i
sim.mat[i,2] = sum(sim.2)/length(sim.2)
sim.mat[i,3] = sum(sim.3)/length(sim.3)
sim.mat[i,4] = sum(sim.4)/length(sim.4)

}

graph.sim = t( sim.mat[,2:4] )
colnames(graph.sim) = sim.mat[,1]

barplot(graph.sim[1,], ylim=c(0,1), col="red",
main="Probability of Having Multiple People with the Same Birthday",
xlab="People with Birthday",
ylab="Probability")
barplot(graph.sim[2,], ylim=c(0,1), col="blue", add=T)
barplot(graph.sim[3,], ylim=c(0,1), col="black", add=T)
abline(h=.5)
legend("topleft", c("2","3","4"), col=c("red","blue","black"), lwd=3)

To leave a comment for the author, please follow the link and comment on their blog: Statistical Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)