How to Remember the Poisson Distribution

July 3, 2014
By

(This article was first published on The Pith of Performance, and kindly contributed to R-bloggers)

The Poisson cumulative distribution function (CDF) \begin{equation} F(α,n) = \sum_{k=0}^n \dfrac{α^k}{k!} \; e^{-α} \label{eqn:pcdf} \end{equation} is the probability of at most $n$ events occurring when the average number of events is α, i.e., $\Pr(X \le n)$. Since \eqref{eqn:pcdf} is a probability function, it cannot have a value greater than 1. In R, the CDF is given by the function ppois(). For example, with α = 4 the first 16 terms are

> ppois(0:15,4)
[1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638
[9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511
As the number of events increases from 0 to 15 the CDF approaches 1. See Figure.

The probability of exactly $n$ events occurring is given by probability density function (PDF) or probability mass function, more accurately, since it's a discrete distribution: \begin{equation} \Pr(X = n) = \dfrac{α^k}{k!} \; e^{-α}; \quad n = 0, 1, 2, \ldots \label{eqn:ppdf} \end{equation} which is just \eqref{eqn:pcdf} without the summation because only single event is considered. In R, the probability density is calculated using the function dpois(). Using α = 4 again, we get


> dpois(0:15,4)
[1] 1.831564e-02 7.326256e-02 1.465251e-01 1.953668e-01 1.953668e-01 1.562935e-01 1.041956e-01
[8] 5.954036e-02 2.977018e-02 1.323119e-02 5.292477e-03 1.924537e-03 6.415123e-04 1.973884e-04
[15] 5.639669e-05 1.503912e-05

The Poisson distribution is used to model such things as the number of clicks detected by Geiger counter (audio). It is also the most commonly assumed source of arrivals in queueing theory and computer performance analysis. In fact, it was Agner Erlang who first presented the Poisson distribution as a model of incoming telephone calls with $\alpha = \lambda t$ in 1907 for the purpose of sizing trunkline capacity at Danish Telekom. However, for those not engaged in applying probability theory on a regular basis, the expression in \eqref{eqn:pcdf} looks formidable and hard to remember.

The trick I employ in my classes is to remember a much simpler, but wrong, version of \eqref{eqn:pcdf} and then correct it. The corrections can be regarded as a little story that is easy to remember: you're more likely to remember a story than a formula like \eqref{eqn:pcdf}. Here's the story.

  1. Start with this simple (but incorrect) expression for the CDF \begin{equation} F(α,n) \sim e^{+α} \; \times \; e^{-α} \label{eqn:1cdf} \end{equation} Clearly, \eqref{eqn:1cdf} cannot have a value bigger than 1, which is what is required of a probability. The problem, however, is that since α is a constant this equation will always be equal to 1, which is not quite what we want. For example, if α = 0: \begin{equation} e^{0} \; \times \; e^{0} = 1 \times 1 = 1 \end{equation} In general, for any positive α: \begin{equation} e^{+α} \; \times \; e^{-α} = \dfrac{e^{α}}{e^{α}} = 1 \end{equation} Clearly, this stuck version is wrong. The question is: How can we correct it?

  2. The factor $e^{-α}$ in \eqref{eqn:1cdf} is a decaying exponential that will approach zero for any large value of α. The problem lies with $e^{+α}$ since it will become enormous for an arbitrarily large value of α. So, we need to tame it.

  3. Recall that the exponential function can be written as an infinite power series: \begin{equation} e^{x} = 1 + x + \dfrac{x^2}{2!} + \dfrac{x^3}{3!} + \ldots \label{eqn:infexp} \end{equation}

  4. But, if we truncate the series \eqref{eqn:infexp} at $n$ terms \begin{equation} 1 + x + \dfrac{x^2}{2!} + \ldots + \dfrac{x^n}{n!} \label{eqn:truncexp} \end{equation} it is no longer equivalent to $e^{x}$, but something less. The shorthand notation for \eqref{eqn:truncexp} is \begin{equation} \sum_{k=0}^n \dfrac{x^k}{k!} \end{equation} In our case, $x$ takes a specific value α.
  5. The factor $e^{+α}$ in \eqref{eqn:1cdf} is now replaced with the tamed sum: \begin{equation} e^{+α} ~\rightarrow~ \sum_{k=0}^n \dfrac{α^k}{k!} \label{eqn:sumexp} \end{equation}
Combining these corrections produces \begin{equation} e^{+α} \; \times \; e^{-α} ~\rightarrow~ \sum_{k=0}^n \dfrac{α^k}{k!} \; \times \; e^{-α} \end{equation} which is \eqref{eqn:pcdf}, and done!

To leave a comment for the author, please follow the link and comment on his blog: The Pith of Performance.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.