You’ve heard of tennis elbow. Well, there’s a non-sports, performance injury that I like to call hockey elbow. An example of such an “injury” is shown in Figure 1, which appeared in a recent computer performance analysis presentation. It’s a reminder of how easy it is to become complacent when doing performance analysis and possibly end up reaching the wrong conclusion.
Figure 1 is seriously flawed for two reasons:
- It incorrectly shows the response time curve with a vertical asymptote.
- It compounds the first error by employing a logarithmic x-axis.
The relationship between performance metrics is generally nonlinear. That’s what makes performance analysis both interesting and hard. Your brain has evolved to think linearly, whereas computer systems behave nonlinearly. As a consequence, your brain needs help to comprehend those nonlinear effects. That’s one reason plotting performance data can be so helpful—as long as it’s done correctly.
Nonlinearity is a class, not a thing.
Response-time data belongs to a class of nonlinearity that is best characterized by a convex function. That’s a mathematical term that simply means the plotted data tends to curve upward and away from the x-axis: labeled “load” in Figure 1.
Although there are many possible ways data can trend upward, the convexity of response-time data is limited to the two cases shown in Figure 2, viz., an elbow curve (left) or a hockey stick curve (right). The distinction is easily understood by noting the following visual characteristics:
- Elbow form: the blue curve representing the response time data trends upward so as to follow the 90 degree bend formed by y axes on the right side of the plot, i.e., a vertical asymptote.
- Hockey stick: the blue curve representing the response time data bends upward but at an angle much wider than 90 degrees and runs along the asymptote represented, quite literally, by the hickey stick handle.
The burning question now becomes, what determines whether the response time data follows the elbow or the hockey stick profile? The answer hinges on the use of the word “load”—one of the most overloaded words in the performance lexicon.
Pay very close attention to the metric displayed on the x-axis.
It should be emphasized that the one metric “load” cannot represent is time. When carrying out load tests or running an internal benchmarks, tools like LoadRunner or JMeter will usually present performance metrics as a time series, like Figure 3, while the data are being collected.
Here, however, I am talking about post-processed performance metrics, where the collected data has been time-averaged. In Figure 3, the time average is represented by the height of the horizontal red line. In other words, a single number that represents the average response time during the measurement period. An example of multiple post-processed response-time measurements is shown in Figure 4. The timestamps have been eliminated. Each data point in Figure 4 is derived from the equivalent of the red line in Figure 3.
Times series performance data has to be time averaged.
Note the resemblance of the curves in Figure 4 to the hockey stick curve in Figure 2. In this case, the load represents the “number of concurrent requests” in the system under test (SUT). The testing harness was Apache HTTP server benchmarking tool.
Figure 4. Measured response time hockey sticks [Source: Juicebox Games, Inc.]
Why do these data have a hockey stick profile? Each request that is issued by a load generator traverses the SUT in order to be processed and returns its result (e.g., status or data) to the client load script. The script may be programed to represent an additional user delay, called the think time, before submitting the next request. The foot of the hockey stick corresponds to light load due to only a few load generators (between 1 and 10) issuing a relatively small number requests into the SUT. Hence, size of any queues is relatively small. The response time corresponds closely to the sum of the service demand at each of the processing resources (whatever they might be). Since lower is better for response times, that’s also the best time you can achieve on the SUT.
At some point, however, as the number of requests is increased, one of the servicing resources will become 100% busy: the bottleneck resource. In Figure 4, this seems to occur around 50 generators. Beyond that load, the resource queues increase almost linearly with the number of concurrent requests. Consequently, the response time approaches the linear handle of the hockey stick. The slope of the handle is determined by the service time at the bottleneck. For that slope to be vertical (i.e., to become an elbow) would require its service demand be infinite! In practical terms, the SUT would not be functional. If there is no think-time delay on the client side, the queues begin to grow immediately in the SUT and the hockey-stick appears to have no foot, i.e., it’s all handle.
Response times plotted as a function of the number of client generators or users will always have a hockey stick profile.
With that pronouncement firmly in mind, you might think that a load-test system can never exhibit an elbow profile. That turns out not to be true. Once again, it depends on how the load metric is defined.
Figure 5. Measured NFS response time elbow [Source: SPEC.org]
The elbow profile in Figure 5 is taken from the Standard Performance Evaluation Corporation (SPEC) web site and shows the response time in milliseconds for an EMC Isolon NFS server as a (convex) function of throughput measured in ops/second. Many similar curves can be found online in the sanctioned SPECsfs2008_nfs.v3 benchmark results. The important point here is that the load metric is throughput, not the number of users or generators, as it is in Figure 4.
Unfortunately, the SPEC run rules only require 10 load points to be reported, so Figure 5 is ambiguous as to whether it follows a hockey stick or an elbow profile. However, I can use my PDQ queueing analyzer in R to produce an approximate extrapolation in Figure 6 that clearly demonstrates the NFS response times follow an elbow curve.
Once the bottleneck resource saturates at 100% utilization, the throughput becomes throttled at around 275,000 ops per second in the EMC benchmark (Figure 5) or 200,000 ops per second in the PDQ model (Figure 6). In other words, the throughput cannot exceed that value because it’s rate-limited by the bottleneck resource. Nonetheless, the queues will continue to grow under increasing request load, so the response time curve has no choice but to increase in the vertical direction and thereby produce the elbow profile.
Response times plotted as a function of the throughput will always have an elbow profile.
To convince you that this must be true, I can apply Little’s law to the reported EMC benchmark data to determine the number of requests resident in the SUT, i.e., the otherwise latent independent variable.
df <- read.table(text="X R 25504 0.7 51054 0.6 76667 0.7 102288 0.8 127879 0.9 153497 1.0 179261 1.2 205226 1.4 231069 2.0 253357 5.7", header=TRUE ) # Estimate N from Little's law X x R: df$Nest <- df$X * df$R * 1e-3 df$Nact <- df$Nest * 7 * 192 / max(df$Nest) > df X R Nest Nact 1 25504 0.7 17.8528 16.61490 2 51054 0.6 30.6324 28.50838 3 76667 0.7 53.6669 49.94569 4 102288 0.8 81.8304 76.15636 5 127879 0.9 115.0911 107.11080 6 153497 1.0 153.4970 142.85367 7 179261 1.2 215.1132 200.19746 8 205226 1.4 287.3164 267.39416 9 231069 2.0 462.1380 430.09380 10 253357 5.7 1444.1349 1344.00000 plot(df$Nact,df$R,type="b", xlab="SFS client generators (actual)", ylab="Response (msecs)" )The inferred response times in Figure 7 are plotted as a function of the number of generated requests (the independent variable). Since the SPEC SFS benchmark permits the think-time to be set to zero (so as to achieve maximal throughput) the response times immediately begin to climb up the hockey stick handle. Hence, the foot of the hockey stick appears to be amputated.
In Figures 4 and 7 the load, expressed as the number of generators, is an independent variable. It’s a system configuration parameter, not a measured quantity. The number of generators is established independently, and prior to running the test, in order to cause a certain load to be impressed on the SUT at runtime. The SUT’s response to that load is the measured dependent variable, viz., the response time. In Figures 5 and 6, the throughput is not an independent variable because it also depends on the impressed load. Therefore, it is a dependent variable. Like the response time, the throughput is also a measured quantity, not an independent test parameter. In Figures 5 and 6, the independent variable is implicit rather than explicit.
If you’d like to know about how to analyze performance data in this way, you should consider attending the upcoming GCaP class on September 21.
Oh! I almost forgot. Here’s the computed PDQ version of Figure 1 with a logarithmic x-axis taken out to 5000 users. Clearly, it is not an elbow curve with a vertical asymptote at around 1000 users.
Naively applying logarithmic transforms to performance data is something I’ve cautioned against because it usually changes the original nonlinear shape in a nonlinear and unintuitive way. The transformed curve then gives the wrong visual cues, amongst other potential problems. See “Sex, Lies and Log Plots” and “What’s Wrong with This Picture?.” Although the hockey stick shape appears to be retained under a log transform in Figure 8, that’s an optical illusion: the mathematical details are very different.
Don’t plot data on a logarithmic scale without clearly labeling the axes to warn the reader.
One redeeming feature of Figure 1 is that it correctly depicts the service level target (“SLA”) as a horizontal line through the response time curve rather than a knee in the curve, that doesn’t actually exist. See “Response Time Knees and Queues” and “Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolae” for more about that topic.