**The Pith of Performance**, and kindly contributed to R-bloggers)

Listing 2.2 on p. 35 of my Perl::PDQ book shows you how to generate exponential variates in **Perl**. Here, however, I want to use R to compare exponential delays with both the *uniform distribution* (the **default** distribution available in all load-test simulators) and the *normal distribution* (the familiar "**bell curve**").

The R function that generates exponential variates directly is `rexp(n, rate = 1)` where, for example, the parameter called `rate` might correspond to the arrival rate of requests going into your test rig or *system under test* (SUT). The R programming language uses the same notation as p. 57 of my Perl::PDQ book. In my books and classes, I usually write that rate as $\lambda$ to match conventional queueing theory symbology. The probability density function (PDF), or `dexp()` in R, is usually written as:

\begin{equation} f(t) = \lambda e^{-\lambda t} \end{equation}

and shown in Figure 1.

The rate is $\lambda$, but the average or statistical mean of (1) is given by the inverse rate or $1/\lambda$. Since $\lambda$ is the average arrival**rate**, $1/\lambda$ is the average interarrival

**time**as would be seen by the SUT. The view from the load-test client corresponds to a think-time delay of $Z = 1/\lambda$ in your script. In either case, the delay is the time interval between requests, whether departing the client or arriving at the SUT. If you could apply the R function

`rexp()`directly to produce 10 exponentially distributed delays with a mean time of $Z=30$ seconds, you would write

`rexp(10,1/30)`with the result:

> rexp(10,1/30)

[1] 126.841780 78.137417 12.989666 31.544992 9.598437 89.299508 2.063628 4.597275

[9] 24.792890 11.950121

Note that some delays are much smaller than the mean while other delays are much greater. Unfortunately, this R function is not available to you in load-test scripts so, you have to code your own. Here's how that works.

**Inverse Transformation**

In eqn. (1), we have the output $f(t)$ on the left and the corresponding delay $(t)$ on the right side (in the exponent). However, we would really prefer to have things the other way around: flip a coin to get an *input* on the right and find out what delay that corresponds to as an *output* on the left.

We can use the *inverse transform* to do precisely that. The inverse function does not necessarily exist for an arbitrary probability distribution but, thankfully, the exponential distribution has a very simple form which allows it. The inverse of the exponential function is the **natural logarithm** function. The PDF in (1) lies in the range $0 \le f < \lambda$ on the $y$-axis, but we need to work with *probabilities*. The function which does this is the *cumulative distribution function* $F(t)$ in Figure 2:

\begin{equation} F(t) = 1 - e^{-\lambda t} \end{equation}

which is strictly bounded by the range $0 \le F < 1$. $F(t)$ is the corresponding **area** under $f(t)$ and corresponds to `pexp(q, rate = 1)` in R.

*inversion*process for probability values $0.90$, $0.80$ and $0.30$. BTW, those probability values also correspond respectively to $90$th, $80$th, and $30$th percentiles, if you prefer to think of them that way. We can simulate the coin flip by using a variate $u \sim U(0,1)$ chosen from a

**uniform**distribution $0 \le u < 1$.

Based on Figure 2, how can we calculate the corresponding interarrival delay $(t)$ that the load generator should use? Letting $u$ represent $F$ in (2) and transposing produces:

\begin{equation} e^{-\lambda t} = 1 - u \end{equation}

Next, we solve (3) for $t$ by taking natural logs of both sides—the **inverse function**:

\begin{equation} \lambda t = - \ln(1 - u) \end{equation}

But the value of $u$ lies in the same interval as $(1-u)$, since they have the same uniform distribution. Hence, we can use the slightly simpler form:

\begin{equation} t = - \frac{\ln(u)}{\lambda} \end{equation}

Finally, we have arrived at the place where we wanted to be: flip a coin to get a **random** *input* on the right hand side of (5) and find out what delay the **client script** should use as an *output* on the left. For load testing, the random delay $(t)$ is associated with a mean think time $Z = 1/\lambda$ and is therefore computed using:

\begin{equation} t = -Z \ln(u) \end{equation}

Equation (6) is what `rexp()` uses under the covers, and it's what you need to code in your client test scripts. A rather simple formula which, again, underscores the lunacy of not having it integrated into the load-test simulator.

**Examples in R**

Using R, we first generate $10$ random variates (coin tosses) from a **uniform** distribution:

> uVariates <- runif(10) # create $10$ uniform variates

> uVariates

[1] 0.13737362 0.23143301 0.59315715 0.56576222 0.48420383 0.66065404 0.77418521 0.07673259

[9] 0.39985820 0.99247847

Next, we use these values as inputs into eqn. (6), with a mean time of 30 seconds, to calculate 10 **exponentially distributed delays**:

> thinkZ <- 30 # specify the average desired delay

> tDelays <- -thinkZ*log(uVariates) # see eqn. (6)

> tDelays

[1] 59.5515274 43.9039444 15.6688773 17.0874419 21.7574800 12.4357489 7.6783242 77.0228628

[9] 27.4993587 0.2264988

Note the spread of delay times, which would also create significant fluctuations in queue depth as seen by buffers on the SUT side.

For comparison, here are $10$ delay samples produced by a *uniform distribution* with the same mean as used for the exponential samples, i.e., the arithmetic mean $\frac{0+60}{2}=30$ seconds:

> runif(10,0,60)

[1] 25.90247 51.94069 30.43872 10.11337 29.17908 25.90095 31.91967 42.46816 49.94629 34.62311

Similarly, here are $10$ delay samples produced by a *normal distribution* with a mean of $30$ seconds:

> rnorm(10,30)

[1] 30.57623 30.63393 30.45814 29.40551 29.59376 30.38025 29.91399 30.12652 29.67296 29.00941

Clearly, the exponential distribution produces a greater spread of delay times.**Related Posts**

**leave a comment**for the author, please follow the link and comment on his blog:

**The Pith of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...