How to Remember the Poisson Distribution

[This article was first published on The Pith of Performance, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Poisson cumulative distribution function (CDF) \begin{equation} F(α,n) = \sum_{k=0}^n \dfrac{α^k}{k!} \; e^{-α} \label{eqn:pcdf} \end{equation} is the probability of at most $n$ events occurring when the average number of events is α, i.e., $\Pr(X \le n)$. Since \eqref{eqn:pcdf} is a probability function, it cannot have a value greater than 1. In R, the CDF is given by the function ppois(). For example, with α = 4 the first 16 terms are
> ppois(0:15,4)
 [1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638
 [9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511
As the number of events increases from 0 to 15 the CDF approaches 1. See Figure.

The probability of exactly $n$ events occurring is given by probability density function (PDF) or probability mass function, more accurately, since it’s a discrete distribution: \begin{equation} \Pr(X = n) = \dfrac{α^k}{k!} \; e^{-α}; \quad n = 0, 1, 2, \ldots \label{eqn:ppdf} \end{equation} which is just \eqref{eqn:pcdf} without the summation because only single event is considered. In R, the probability density is calculated using the function dpois(). Using α = 4 again, we get

> dpois(0:15,4)
 [1] 1.831564e-02 7.326256e-02 1.465251e-01 1.953668e-01 1.953668e-01 1.562935e-01 1.041956e-01
 [8] 5.954036e-02 2.977018e-02 1.323119e-02 5.292477e-03 1.924537e-03 6.415123e-04 1.973884e-04
[15] 5.639669e-05 1.503912e-05

The Poisson distribution is used to model such things as the number of clicks detected by Geiger counter (audio). It is also the most commonly assumed source of arrivals in queueing theory and computer performance analysis. In fact, it was Agner Erlang who first presented the Poisson distribution as a model of incoming telephone calls with $\alpha = \lambda t$ in 1907 for the purpose of sizing trunkline capacity at Danish Telekom. However, for those not engaged in applying probability theory on a regular basis, the expression in \eqref{eqn:pcdf} looks formidable and hard to remember.

The trick I employ in my classes is to remember a much simpler, but wrong, version of \eqref{eqn:pcdf} and then correct it. The corrections can be regarded as a little story that is easy to remember: you’re more likely to remember a story than a formula like \eqref{eqn:pcdf}. Here’s the story.

  1. Start with this simple (but incorrect) expression for the CDF \begin{equation} F(α,n) \sim e^{+α} \; \times \; e^{-α} \label{eqn:1cdf} \end{equation} Clearly, \eqref{eqn:1cdf} cannot have a value bigger than 1, which is what is required of a probability. The problem, however, is that since α is a constant this equation will always be equal to 1, which is not quite what we want. For example, if α = 0: \begin{equation} e^{0} \; \times \; e^{0} = 1 \times 1 = 1 \end{equation} In general, for any positive α: \begin{equation} e^{+α} \; \times \; e^{-α} = \dfrac{e^{α}}{e^{α}} = 1 \end{equation} Clearly, this stuck version is wrong. The question is: How can we correct it?

  2. The factor $e^{-α}$ in \eqref{eqn:1cdf} is a decaying exponential that will approach zero for any large value of α. The problem lies with $e^{+α}$ since it will become enormous for an arbitrarily large value of α. So, we need to tame it.

  3. Recall that the exponential function can be written as an infinite power series: \begin{equation} e^{x} = 1 + x + \dfrac{x^2}{2!} + \dfrac{x^3}{3!} + \ldots \label{eqn:infexp} \end{equation}

  4. But, if we truncate the series \eqref{eqn:infexp} at $n$ terms \begin{equation} 1 + x + \dfrac{x^2}{2!} + \ldots + \dfrac{x^n}{n!} \label{eqn:truncexp} \end{equation} it is no longer equivalent to $e^{x}$, but something less. The shorthand notation for \eqref{eqn:truncexp} is \begin{equation} \sum_{k=0}^n \dfrac{x^k}{k!} \end{equation} In our case, $x$ takes a specific value α.
  5. The factor $e^{+α}$ in \eqref{eqn:1cdf} is now replaced with the tamed sum: \begin{equation} e^{+α} ~\rightarrow~ \sum_{k=0}^n \dfrac{α^k}{k!} \label{eqn:sumexp} \end{equation}
Combining these corrections produces \begin{equation} e^{+α} \; \times \; e^{-α} ~\rightarrow~ \sum_{k=0}^n \dfrac{α^k}{k!} \; \times \; e^{-α} \end{equation} which is \eqref{eqn:pcdf}, and done!

To leave a comment for the author, please follow the link and comment on their blog: The Pith of Performance.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)