**The Pith of Performance**, and kindly contributed to R-bloggers)

No doubt you’ve heard about the tyranny of the 9s in reference to computer system availability. You’re probably also familiar with the phrase six sigma, either in the context of manufacturing process quality control or the improvement of business processes. As we discovered in the recent Guerrilla Data Analysis Techniques class, the two concepts are related.

Nines |
Percent |
Downtime/Year |
σ Level |

4 | 99.99% | 52.596 minutes | 4σ |

5 | 99.999% | 5.2596 minutes | – |

6 | 99.9999% | 31.5576 seconds | 5σ |

7 | 99.99999% | 3.15576 seconds | – |

8 | 99.999999% | 315.6 milliseconds | 6σ |

In this way, people like to talk about achieving “5 nines” availability or a “six sigma” quality level. These phrases are often bandied about without appreciating:

- that nines and sigmas refer to similar criteria.
- that high nines and high sigmas are very difficult to achieve consistently.

See the appended Comments below for more details and examples.

To arrive at the 3rd column of numbers in the table, you can use the following R function to find out how much shorter downtime per year each additional 9 imposes. Hence, the term *tyranny*.

downt <- function(nines,tunit=c('s','m','h')) {

ds <- 10^(-nines) * 365.25*24*60*60

if(tunit == 's') { ts <- 1; tu <- "seconds" }

if(tunit == 'm') { ts <- 60; tu <- "minutes" }

if(tunit == 'h') { ts <- 3600; tu <- "hours" }

return(sprintf("Downtime per year at %d nines: %g %s", nines, ds/ts,tu))

}

> downt(5,'m')

[1] "Downtime per year at 5 nines: 5.2596 minutes"

> downt(8,'s')

[1] "Downtime per year at 8 nines: 0.315576 seconds"

The associated σ levels correspond to the area under the Normal (Gaussian) or “bell shaped” curve within that 2σ interval centered on the mean (μ). The σ refers to the standard deviation in the usual way.

The corresponding area under the Normal curve can be calculated using the following R function:

library(NORMT3)

sigp <- function(sigma) {

sigma <- as.integer(sigma)

apc <- erf(sigma/sqrt(2))

return(sprintf("%d-sigma bell area: %10.8f%%; Prob(chance): %e", sigma, apc*100, 1-apc))

}

> sigp(2)

[1] "2-sigma bell area: 95.44997361%; Prob(chance): 4.550026e-02"

> sigp(5)

[1] "5-sigma bell area: 99.99994267%; Prob(chance): 5.733031e-07"

So, 5σ corresponds to slightly more than 99.9999% of the area under in the bell curve; the total area being 100%. It also corresponds closely to six 9s availability. The 2nd number computed by `sigp` is the probability that the achieved availability was a fluke. A reasonable mnemonic for some of these values is:

- 3σ corresponds roughly to a probability of 1 in 1,000 that four 9s availability occurred by chance.
- 5σ is roughly a 1 in a million chance, which is like flipping a fair coin and getting 20 heads in a row.
- 6σ is roughly a 1 in a billion chance that it was a fluke.

Now you see why these goals are easy to covet but hard to achieve.

**leave a comment**for the author, please follow the link and comment on their blog:

**The Pith of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...