Plotting PDQ Output with R

February 27, 2009

(This article was first published on Taking the Pith Out of Performance, and kindly contributed to R-bloggers)

One the nice things about PDQ-R (coming in release 5.0) is the ability to plot PDQ output directly in R. Here’s a PDQ-R script, together with the corresponding graphical output, that I knocked up to show the effect on the throughput curve of adding more queueing delay stages (K), with everything else held constant.

With just a single queue (K = 1) the system saturates very quickly. The throughput curve shoots up the y-axis until it hits the ceiling at X = 2.0 requests/per-unit-time. Consequently, the linear rising slope on the early part of the throughput curve is almost indistinguishable from the optimal load-line at N* = 1.016 clients. This rapid saturation effect is less pronounced in a system with more queues because there are more service stages and completion therefore takes longer. But it requires a considerable number of additional queueing centers to get a noticeable difference, e.g., K = 20, 50. Observe also that the optimal load-line moves to the right and is positioned on the x-axis at a value very close to K. I’ll let you ponder why that must be true.

The plot also explains the rationale for the approach I took in Chap. 10 of the Perl PDQ book where I modeled the scalability measurements of a multi-tier web application. In addition to the measured tiers, I ended up introducing 12 “dummy” queues in order to produce the correct round-trip latency, whilst retaining Z = 0 think time in accord with the original web application test scripts. The stunningly powerful conclusion was that there must’ve been additional latencies that were not included in the original measurements on the test rig. Otherwise, the data that were measured could not be reconciled with each other. Although I couldn’t determine what the sources of those hidden latencies were, I could state quite categorically that they were real. You cannot possibly reach this kind of penetrating conclusion without a performance model. Data comes from the Devil, models come from God.

I didn’t include the corresponding plots showing the effect of the dummy queues (similar to the above) in my Perl PDQ book because it was so tedious to write the data out to a file and then import it into Excel (which is what I was using back then). With PDQ-R, it’s a snap to do it in about 50 lines.

To leave a comment for the author, please follow the link and comment on their blog: Taking the Pith Out of Performance. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , ,

Comments are closed.

Recent popular posts


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)