Applying PDQ in R to Load Testing

[This article was first published on The Pith of Performance, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

PDQ is a library of functions that helps you to express and solve performance questions about computer systems using the abstraction of queues. The queueing paradigm is a natural choice because, whether big (a web site) or small (a laptop), all computer systems can be represented as a network or circuit of buffers and a buffer is a type of queue.

As a performance analyst, there are several things I really like about using PDQ in R; as opposed to the other programming languages: C, Perl, Python, etc. It enables you to:

  1. easily import (large) data with a variety formats
  2. perform sophisticated statistical analysis
  3. extract input parameters for a PDQ model
  4. construct and execute the PDQ model within R
  5. plot the PDQ output and compare it with the original data
  6. test your ideas in the R console and save the best into a script

In applying this approach, you could find yourself using a number of R library packages. To improve clarity in your modeling script, you might like to identify clearly which routines belong to PDQ; especially if you’re new to PDQ and not familiar with all the functions.

R syntax for naming function dependency is the same as Perl. The :: operator is used for explicitly exported names. It also avoids conflict between packages the export different functions with the same name. The ::: operator is used for access to functions that are not exported in the package namespace.

Let’s look at the above steps in the context of an example based on load testing data. A key point to observe here is how the performance data and the performance model play together to provide validation of the measurements.

Performance data

We begin by importing the load test data from measurements of an application intended for a three-tier architecture.

<br />library(ineq)<br />library(pdq)<br /><br /># Read in the performance measurements<br />gdat <- read.csv("/Users/njg/.../gcap.dat",header=TRUE)<br />
Even though the ineq package is part of base R functionality, I’ve loaded it explicitly so as to name its functions explicitly. This will also provide a contrast with explicitly named functions from the PDQ package.
<br />> gdat<br />  Vusr Xgps   Rms Uweb Uapp Udbm<br />1    1   24  26.0 0.21 0.08 0.04<br />2    2   48  26.0 0.41 0.13 0.05<br />3    4   85  29.3 0.74 0.20 0.05<br />4    7  100  44.7 0.95 0.23 0.05<br />5   10  115  66.0 0.96 0.22 0.06<br />6   20  112 140.0 0.97 0.22 0.06<br />
The columns are respectively the client load, measured throughput, response time (in milliseconds), and system utilization on each of the three tiers.

Statistical analysis

We can now perform various kinds of statistical analysis on these data.

<br />> summary(gdat)<br />      Vusr             Xgps             Rms              Uweb             Uapp       <br /> Min.   : 1.000   Min.   : 24.00   Min.   : 26.00   Min.   :0.2100   Min.   :0.0800  <br /> 1st Qu.: 2.500   1st Qu.: 57.25   1st Qu.: 26.82   1st Qu.:0.4925   1st Qu.:0.1475  <br /> Median : 5.500   Median : 92.50   Median : 37.00   Median :0.8450   Median :0.2100  <br /> Mean   : 7.333   Mean   : 80.67   Mean   : 55.33   Mean   :0.7067   Mean   :0.1800  <br /> 3rd Qu.: 9.250   3rd Qu.:109.00   3rd Qu.: 60.67   3rd Qu.:0.9575   3rd Qu.:0.2200  <br /> Max.   :20.000   Max.   :115.00   Max.   :140.00   Max.   :0.9700   Max.   :0.2300  <br />      Udbm        <br /> Min.   :0.04000  <br /> 1st Qu.:0.05000  <br /> Median :0.05000  <br /> Mean   :0.05167  <br /> 3rd Qu.:0.05750  <br /> Max.   :0.06000  <br />
More significantly, we can use R statistical functions to derive appropriate parameters for a PDQ model.
<br /># Apply Little's law to get mean service times + CoVs<br />Sweb <- mean(gdat$Uweb/gdat$Xgps)<br />Sapp <- mean(gdat$Uapp/gdat$Xgps)<br />Sdbm <- mean(gdat$Udbm/gdat$Xgps)<br /><br />Csw <- ineq::var.coeff(gdat$Uweb/gdat$Xgps)<br />Csa <- ineq::var.coeff(gdat$Uapp/gdat$Xgps)<br />Csd <- ineq::var.coeff(gdat$Udbm/gdat$Xgps)<br /><br />s1 <- sprintf("System: %6s %6s %6s\n", "Web","App","DBMS")<br />s2 <- sprintf("Mean S: %6.4f %6.4f %6.4f\n", Sweb, Sapp, Sdbm)<br />s3 <- sprintf("CoV  S: %6.4f %6.4f %6.4f\n", Csw, Csa, Csd)<br />cat("\n",s1,s2,s3)<br />
In particular, we calculate the average service times on each tier (second row) by applying Little’s law.
<br /> System:    Web    App   DBMS<br /> Mean S: 0.0088 0.0024 0.0008<br /> CoV  S: 0.0411 0.1989 0.5271<br />

PDQ model

As shown in Figure 1, the service times for each of the three tiers in the load-test platform can be represented as queueing resources in PDQ.

There is a finite number of requests allowed in the system corresponding to the load clients or virtual users that range between N = 1 and N = 20 Vusers, represented by the octagonal box in Figure 1. Using the diagram, we set up the following PDQ model. Note the use of explicitly named functions from the PDQ library

<br /># Plotting variables<br />xc <- 0  # Vuser loads<br />yc <- 0  # PDQ throughputs<br />rc <- 0  # PDQ response times<br /><br /># Define and solve the PDQ model<br />for(n in 1:max(gdat$Vusr)) {<br /> pdq::Init("Three-Tier Model")<br /> <br /> pdq::CreateClosed("httpGETs", TERM, as.numeric(n), 0.028)<br /> <br /> pdq::CreateNode("WebServer", CEN, FCFS)<br /> pdq::CreateNode("AppServer", CEN, FCFS)<br /> pdq::CreateNode("DBMServer", CEN, FCFS)<br /> <br /> pdq::SetDemand("WebServer", "httpGETs", Sweb)<br /> pdq::SetDemand("AppServer", "httpGETs", Sapp)<br /> pdq::SetDemand("DBMServer", "httpGETs", Sdbm)<br /><br /> pdq::Solve(EXACT)<br /> <br /> xc[n] <- n<br /> yc[n] <- pdq::GetThruput(TERM, "httpGETs")<br /> rc[n] <- pdq::GetResponse(TERM, "httpGETs") * 10^3<br />}<br />
In the above PDQ model, we’ve selected the predicted throughput and the predicted response times to compare with the original load-test data.

Plot PDQ results

<br /># Plot throughput and response time models<br />par(mfrow=c(2,1))<br />plot(xc, yc, type="l", lwd=1, col="blue", ylim=c(0,120), main="PDQ Throughput Model", xlab="Vusers (N)", ylab="Gets/s X(N)")<br />points(gdat$Vusr, gdat$Xgps)<br />plot(xc, rc, type="l", lwd=1, col="blue", ylim=c(0,220), main="PDQ Response Time Model", xlab="Vusers (N)", ylab="ms R(N)")<br />points(gdat$Vusr, gdat$Rms) <br />
The above R code produces the following plot array:

We see that the data and PDQ model are in good agreement with the throughput saturating above N = 5 vusers with the corresponding response time rising up the proverbial “hockey stick” handle.

PDQ report

Optionally, we can produce a formal PDQ report to examine the performance of each of the three tiers, even if we don’t have any corresponding performance measurements from the load-test platform. This is one way by which bottlenecks can be predicted and checked before deploying into production.

<br />> pdq::Report()<br />                ***************************************<br />                ****** Pretty Damn Quick REPORT *******<br />                ***************************************<br />                ***  of : Sun May 15 18:26:21 2011  ***<br />                ***  for: Three-Tier Model          ***<br />                ***  Ver: PDQ Analyzer v5.0 030211  ***<br />                ***************************************<br />                ***************************************<br /><br />                =======================================<br />                ******    PDQ Model INPUTS      *******<br />                =======================================<br /><br />Node Sched Resource   Workload   Class     Demand<br />---- ----- --------   --------   -----     ------<br />CEN  FCFS  WebServer  httpGETs   TERML     0.0088<br />CEN  FCFS  AppServer  httpGETs   TERML     0.0024<br />CEN  FCFS  DBMServer  httpGETs   TERML     0.0008<br /><br />Queueing Circuit Totals:<br />        Streams:      1<br />        Nodes:        3<br /><br />WORKLOAD Parameters:<br />httpGETs      20.00        0.0120     0.03<br /><br /><br />                =======================================<br />                ******   PDQ Model OUTPUTS      *******<br />                =======================================<br /><br />Solution Method: EXACT<br /><br />                ******   SYSTEM Performance     *******<br /><br />Metric                     Value    Unit<br />------                     -----    ----<br />Workload: "httpGETs"<br />Mean concurrency         16.8004    Users<br />Mean throughput         114.2725    Users/Sec<br />Response time             0.1470    Sec<br />Round trip time           0.1750    Sec<br />Stretch factor           12.2633<br /><br />Bounds Analysis:<br />Max throughput          114.2725    Users/Sec<br />Min response              0.0120    Sec<br />Max Demand                0.0088    Sec<br />Tot demand                0.0120    Sec<br />Think time                0.0280    Sec<br />Optimal clients           4.5696    Clients<br /><br /><br />                ******   RESOURCE Performance   *******<br /><br />Metric          Resource     Work              Value   Unit<br />------          --------     ----              -----   ----<br />Throughput      WebServer    httpGETs       114.2725   Users/Sec<br />Utilization     WebServer    httpGETs       100.0000   Percent<br />Queue length    WebServer    httpGETs        16.3144   Users<br />Waiting line    WebServer    httpGETs        15.3144   Users<br />Waiting time    WebServer    httpGETs         0.1340   Sec<br />Residence time  WebServer    httpGETs         0.1428   Sec<br /><br />Throughput      AppServer    httpGETs       114.2725   Users/Sec<br />Utilization     AppServer    httpGETs        27.7529   Percent<br />Queue length    AppServer    httpGETs         0.3841   Users<br />Waiting line    AppServer    httpGETs         0.1066   Users<br />Waiting time    AppServer    httpGETs         0.0009   Sec<br />Residence time  AppServer    httpGETs         0.0034   Sec<br /><br />Throughput      DBMServer    httpGETs       114.2725   Users/Sec<br />Utilization     DBMServer    httpGETs         9.2447   Percent<br />Queue length    DBMServer    httpGETs         0.1019   Users<br />Waiting line    DBMServer    httpGETs         0.0094   Users<br />Waiting time    DBMServer    httpGETs         0.0001   Sec<br />Residence time  DBMServer    httpGETs         0.0009   Sec<br />

To leave a comment for the author, please follow the link and comment on their blog: The Pith of Performance.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)