Plotting PDQ Output with R

February 27, 2009
By

(This article was first published on Taking the Pith Out of Performance, and kindly contributed to R-bloggers)

One the nice things about PDQ-R (coming in release 5.0) is the ability to plot PDQ output directly in R. Here's a PDQ-R script, together with the corresponding graphical output, that I knocked up to show the effect on the throughput curve of adding more queueing delay stages (K), with everything else held constant.


With just a single queue (K = 1) the system saturates very quickly. The throughput curve shoots up the y-axis until it hits the ceiling at X = 2.0 requests/per-unit-time. Consequently, the linear rising slope on the early part of the throughput curve is almost indistinguishable from the optimal load-line at N* = 1.016 clients. This rapid saturation effect is less pronounced in a system with more queues because there are more service stages and completion therefore takes longer. But it requires a considerable number of additional queueing centers to get a noticeable difference, e.g., K = 20, 50. Observe also that the optimal load-line moves to the right and is positioned on the x-axis at a value very close to K. I'll let you ponder why that must be true.

The plot also explains the rationale for the approach I took in Chap. 10 of the Perl PDQ book where I modeled the scalability measurements of a multi-tier web application. In addition to the measured tiers, I ended up introducing 12 "dummy" queues in order to produce the correct round-trip latency, whilst retaining Z = 0 think time in accord with the original web application test scripts. The stunningly powerful conclusion was that there must've been additional latencies that were not included in the original measurements on the test rig. Otherwise, the data that were measured could not be reconciled with each other. Although I couldn't determine what the sources of those hidden latencies were, I could state quite categorically that they were real. You cannot possibly reach this kind of penetrating conclusion without a performance model. Data comes from the Devil, models come from God.

I didn't include the corresponding plots showing the effect of the dummy queues (similar to the above) in my Perl PDQ book because it was so tedious to write the data out to a file and then import it into Excel (which is what I was using back then). With PDQ-R, it's a snap to do it in about 50 lines.

To leave a comment for the author, please follow the link and comment on his blog: Taking the Pith Out of Performance.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , ,

Comments are closed.