**The Pith of Performance**, and kindly contributed to R-bloggers)

*cumulative distribution function*(CDF) \begin{equation} F(α,n) = \sum_{k=0}^n \dfrac{α^k}{k!} \; e^{-α} \label{eqn:pcdf} \end{equation} is the probability of at most $n$ events occurring when the average number of events is α, i.e., $\Pr(X \le n)$. Since \eqref{eqn:pcdf} is a

*probability*function, it cannot have a value greater than 1. In R, the CDF is given by the function

`ppois()`. For example, with α = 4 the first 16 terms are

> ppois(0:15,4)

[1] 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638

[9] 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511

As the number of events increases from 0 to 15 the CDF approaches 1. See Figure. The probability of *exactly* $n$ events occurring is given by probability density function (PDF) or probability mass function, more accurately, since it's a *discrete* distribution: \begin{equation} \Pr(X = n) = \dfrac{α^k}{k!} \; e^{-α}; \quad n = 0, 1, 2, \ldots \label{eqn:ppdf} \end{equation} which is just \eqref{eqn:pcdf} without the summation because only single event is considered. In R, the probability density is calculated using the function `dpois()`. Using α = 4 again, we get

> dpois(0:15,4)

[1] 1.831564e-02 7.326256e-02 1.465251e-01 1.953668e-01 1.953668e-01 1.562935e-01 1.041956e-01

[8] 5.954036e-02 2.977018e-02 1.323119e-02 5.292477e-03 1.924537e-03 6.415123e-04 1.973884e-04

[15] 5.639669e-05 1.503912e-05

The Poisson distribution is used to model such things as the number of clicks detected by Geiger counter (audio). It is also the most commonly assumed source of arrivals in queueing theory and computer performance analysis. In fact, it was Agner Erlang who first presented the Poisson distribution as a model of incoming telephone calls with $\alpha = \lambda t$ in 1907 for the purpose of sizing trunkline capacity at Danish Telekom. However, for those not engaged in applying probability theory on a regular basis, the expression in \eqref{eqn:pcdf} looks formidable and hard to remember.

The trick I employ in my classes is to remember a much simpler, but *wrong*, version of \eqref{eqn:pcdf} and then correct it. The corrections can be regarded as a little story that is easy to remember: you're more likely to remember a story than a formula like \eqref{eqn:pcdf}. Here's the story.

- Start with this simple (but incorrect) expression for the CDF \begin{equation} F(α,n) \sim e^{+α} \; \times \; e^{-α} \label{eqn:1cdf} \end{equation} Clearly, \eqref{eqn:1cdf} cannot have a value bigger than 1, which is what is required of a probability. The problem, however, is that since α is a constant this equation will
**always**be equal to 1, which is not quite what we want. For example, if α = 0: \begin{equation} e^{0} \; \times \; e^{0} = 1 \times 1 = 1 \end{equation} In general, for any positive α: \begin{equation} e^{+α} \; \times \; e^{-α} = \dfrac{e^{α}}{e^{α}} = 1 \end{equation} Clearly, this*stuck*version is wrong. The question is: How can we correct it? - The factor $e^{-α}$ in \eqref{eqn:1cdf} is a decaying exponential that will approach zero for any large value of α. The problem lies with $e^{+α}$ since it will become enormous for an arbitrarily large value of α. So, we need to tame it.
- Recall that the exponential function can be written as an infinite power series: \begin{equation} e^{x} = 1 + x + \dfrac{x^2}{2!} + \dfrac{x^3}{3!} + \ldots \label{eqn:infexp} \end{equation}
- But, if we truncate the series \eqref{eqn:infexp} at $n$ terms \begin{equation} 1 + x + \dfrac{x^2}{2!} + \ldots + \dfrac{x^n}{n!} \label{eqn:truncexp} \end{equation} it is no longer equivalent to $e^{x}$, but something less. The shorthand notation for \eqref{eqn:truncexp} is \begin{equation} \sum_{k=0}^n \dfrac{x^k}{k!} \end{equation} In our case, $x$ takes a specific value α.
- The factor $e^{+α}$ in \eqref{eqn:1cdf} is now replaced with the tamed sum: \begin{equation} e^{+α} ~\rightarrow~ \sum_{k=0}^n \dfrac{α^k}{k!} \label{eqn:sumexp} \end{equation}

**leave a comment**for the author, please follow the link and comment on his blog:

**The Pith of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...