**Taking the Pith Out of Performance**, and kindly contributed to R-bloggers)

In the GCaP class earlier this month, we talked about the meaning of the load average (in Unix and Linux) and simulating a grocery store checkout lane, but I didn’t actually do it. So, I decided to take a shot at constructing a discrete-event simulation (as opposed to Monte Carlo simulation) of a simple M/M/1 queue in R.

We can make use of a lot of conveniences in R to accomplish such a simulation. For example, we don’t have to worry about random number generation, we can simply use the `rexp()` function for an M/M/1 queue. It may not be the fastest code on the planet but it is guaranteed to be reliable. We also have the ease of integrating PDQ (Pretty Damn Quick) for analytic comparison, as well as the nice statistical analysis and plotting capabilities available in R.

**Simulation Variables**

As usual, we start with a list of the necessary variables for the simulation and its instrumentation.

t.end <- 10^5 # duration of sim

t.clock <- 0 # sim time

Ta <- 1.3333 # interarrival period

Ts <- 1.0000 # service period

t1 <- 0 # time for next arrival

t2 <- t.end # time for next departure

tn <- t.clock # tmp var for last event time

tb <- 0 # tmp var for last busy-time start

n <- 0 # number in system

s <- 0 # cumulative number-time product

b <- 0 # total busy time

c <- 0 # total completions

qc <- 0 # plot instantaneous q size

tc <- 0 # plot time delta

plotSamples <- 100

set.seed(1)

Next, we need to write the R code to perform the actual M/M/1 simulation of arrivals into and departures from the queue.

**Simulation Loop**

This code meant to be pedagogic so, I haven’t bothered to do anything spiffy like pre-allocating the Exp variates, for example. I based it on the example in Mac MacDougall’s book Simulating Computer Systems (an oldie but a goodie), rather than the example in the more recent Introduction to Scientific Programming and Simulation Using R book, because I think there’s a bug in their R code, but I didn’t spend any time trying to find it. Also, that code is not instrumented.

while (t.clock < t.end) {

if (t1 < t2) { # arrival event

t.clock <- t1

s <- s + n * (t.clock - tn) # delta time-weighted number in queue

n <- n + 1

if (t.clock < plotSamples) {

qc <- append(qc,n)

tc <- append(tc,t.clock)

}

tn <- t.clock

t1 <- t.clock + rexp(1, 1/Ta)

if(n == 1) {

tb <- t.clock

t2 <- t.clock + rexp(1, 1/Ts) # exponential interarrival period

}

} else { # departure event

t.clock <- t2

s <- s + n * (t.clock - tn) # delta time-weighted number in queue

n <- n - 1

if (t.clock < plotSamples) {

qc <- append(qc,n)

tc <- append(tc,t.clock)

}

tn <- t.clock

c <- c + 1

if (n > 0) {

t2 <- t.clock + rexp(1, 1/Ts) # exponential service period

}

else {

t2 <- t.end

b <- b + t.clock - tb

}

}

}

So, now we have the simulation workhorse in place.

**Instrumented Metrics**

Here, we collect the instrumentation data to form some well-known performance metrics. They correspond to the definitions given in class.

u <- b/t.clock # utilization B/T

N <- s/t.clock # mean queue length (see the Load Average notes)

x <- c/t.clock # mean throughput C/T

r <- N/x # mean residence time (from Little's law: Q = XR)

q <- sum(qc)/max(tc) # estimated queue length for plot

**Queue Length**

This is a plot of instantaneous queue length à la load average data. This is what queueing fluctuations look like. As I point out in class, they’re responsible for the usually complicated math seen in queueing-theory textbooks that can make your head hurt.

**PDQ Model**

For analytic comparison, we also include the corresponding PDQ-R model in the same script using the online manual for reference.

Init("")

CreateOpen("w",1/Ta) # arrivals into queue

CreateNode("n",CEN,FCFS) # the M/M/1 queue

SetDemand("n","w",Ts) # service time

Solve(CANON)

# Collect individual performance metrics

R <- GetResidenceTime("n","w",TRANS)

Q <- GetQueueLength("n","w",TRANS)

U <- GetUtilization("n","w",TRANS)

X <- GetThruput(TRANS,"w")

Yes, these few lines are equivalent to the above simulation code with instrumentation, and it’s guaranteed to be in steady state. Running PDQ, even in R, is essentially instantaneous. The simulation will take longer, but given the plethora of MIPS/core available today, especially on laptops, running simulations in R is entirely feasible.

**Results**

Finally, we can compare the simulated M/M/1 queue with the corresponding PDQ results. As usual, it’s best to break them into inputs and outputs.

- Inputs:

Tsim: 1.00e+05

Ta: 1.3333, Ts: 1.0000 # times

Ar: 0.7500, Sr: 1.0000 # rates - Outputs:

Usim: 0.7477, Updq: 0.75

Xsim: 0.7495, Xpdq: 0.75

Rsim: 4.0316, Rpdq: 4.00

Qsim: 3.0219, Qpdq: 3.00

Within the expected limits of precision, we can conclude that the simulation reached steady state during the specified 10^{5} time-steps.

No doubt, I’ll go into more detail about doing simulations in R during the upcoming GDAT class in August.

**leave a comment**for the author, please follow the link and comment on their blog:

**Taking the Pith Out of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...