# The Motivation for the Poisson Distribution

November 4, 2013
By

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)

`# The Poisson distribution has the interesting property that it# models outcomes from events that are independent and equally# likely to occur.  The distribution takes only one parameter mu# which is equal to both the mean (expected number of events) # as well as the variance. # This distribution as with all distributions is somewhat # fascinating because it represents an approximation of a # real world phenomenon. # Imagine you are trying to model the mail delivery on wednesdays. # On average you recieve 9 pieces of mail. If the mail delivery# system is well modeled by a poisson distribution then# the standard deviation of mail delivery should be 3.# Meaning most days you should recieve between 3 and 15 pieces# of mail.   # What underlying physical phenomenon must exist for this to be# possible? # In order to aid this discussion we will think of the poisson# distribution as a limitting distribution of the sum of # outcomes from a number of independent binary draws: DrawsApprox <- function(mu, N) sum(rbinom(N,1,mu/N)) # This idea is if we specify a number of expected outcomes mu# and give a number of draws (N>mu) then we can approximate the# single draw of a poisson by summing across outcomes. DrawsApprox(9,9)# In this case of course the sum is 9 and variance = 0# Under this case there are 9 letters which are always# sent out every Wednesday. # More interestingly:DrawsApprox(9,18)# In this case there are 18 letters that may be sent out.# Any one of them is possible at a 50% rate. # We want to know what the mean and variance is.# Let us design a simple function to achieve this.evar <- function(fun, draw=100, outc=NULL, ...) {  for(i in 1:draw) outc <- c(outc, get(fun)(...))  list(outc=outc, mean=mean(outc), var=var(outc))} evar("DrawsApprox", draw=10000, N=18, mu=9)# I get the mean very close to 9 as we should hope# but interestingly the variance less than five.# This is less than that of the poisson which is 9. # Let's see what happens if we double the number of# potential letters going out which will halve the # probability of any particular letter.evar("DrawsApprox", draw=10000, N=36, mu=9)# Now the variance is about 6.7 evar("DrawsApprox", draw=10000, N=72, mu=9)# Now 7.7 evar("DrawsApprox", draw=10000, N=144, mu=9)# 8.6 evar("DrawsApprox", draw=10000, N=288, mu=9)# 8.65 # We can see that as the number of letters gets very large# the mean and variance of the number letters approaches# the same number 9.  I will never be able to choose a # large enough number of letters so that the variance exactly# equals the mean. # However the didactic point of how the distribution is # structured and when it may be appropriate to use should be# clear.  Poisson is a good fit when the likelihood of each# individual outcome is equal, yet the number of possible# outcomes is large (in principal I could recieve 100 pieces# of mail in a single day though it would be very unlikely). bigdraw <- evar("DrawsApprox", draw=10000, N=1000, mu=9)summary(bigdraw\$outc)`
` `

Created by Pretty R at inside-R.org

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...