The Motivation for the Poisson Distribution

November 4, 2013
By

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)

# The Poisson distribution has the interesting property that it
# models outcomes from events that are independent and equally
# likely to occur. The distribution takes only one parameter mu
# which is equal to both the mean (expected number of events)
# as well as the variance.
 
# This distribution as with all distributions is somewhat
# fascinating because it represents an approximation of a
# real world phenomenon.
 
# Imagine you are trying to model the mail delivery on wednesdays.
 
# On average you recieve 9 pieces of mail. If the mail delivery
# system is well modeled by a poisson distribution then
# the standard deviation of mail delivery should be 3.
# Meaning most days you should recieve between 3 and 15 pieces
# of mail.
 
# What underlying physical phenomenon must exist for this to be
# possible?
 
# In order to aid this discussion we will think of the poisson
# distribution as a limitting distribution of the sum of
# outcomes from a number of independent binary draws:
 
DrawsApprox <- function(mu, N) sum(rbinom(N,1,mu/N))
 
# This idea is if we specify a number of expected outcomes mu
# and give a number of draws (N>mu) then we can approximate the
# single draw of a poisson by summing across outcomes.
 
DrawsApprox(9,9)
# In this case of course the sum is 9 and variance = 0
# Under this case there are 9 letters which are always
# sent out every Wednesday.
 
# More interestingly:
DrawsApprox(9,18)
# In this case there are 18 letters that may be sent out.
# Any one of them is possible at a 50% rate.
 
# We want to know what the mean and variance is.
# Let us design a simple function to achieve this.
evar <- function(fun, draw=100, outc=NULL, ...) {
for(i in 1:draw) outc <- c(outc, get(fun)(...))
list(outc=outc, mean=mean(outc), var=var(outc))
}
 
evar("DrawsApprox", draw=10000, N=18, mu=9)
# I get the mean very close to 9 as we should hope
# but interestingly the variance less than five.
# This is less than that of the poisson which is 9.
 
# Let's see what happens if we double the number of
# potential letters going out which will halve the
# probability of any particular letter.
evar("DrawsApprox", draw=10000, N=36, mu=9)
# Now the variance is about 6.7
 
evar("DrawsApprox", draw=10000, N=72, mu=9)
# Now 7.7
 
evar("DrawsApprox", draw=10000, N=144, mu=9)
# 8.6
 
evar("DrawsApprox", draw=10000, N=288, mu=9)
# 8.65
 
# We can see that as the number of letters gets very large
# the mean and variance of the number letters approaches
# the same number 9. I will never be able to choose a
# large enough number of letters so that the variance exactly
# equals the mean.
 
# However the didactic point of how the distribution is
# structured and when it may be appropriate to use should be
# clear. Poisson is a good fit when the likelihood of each
# individual outcome is equal, yet the number of possible
# outcomes is large (in principal I could recieve 100 pieces
# of mail in a single day though it would be very unlikely).
 
bigdraw <- evar("DrawsApprox", draw=10000, N=1000, mu=9)
summary(bigdraw$outc)
 
Created by Pretty R at inside-R.org

To leave a comment for the author, please follow the link and comment on his blog: Econometrics by Simulation.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.