**The Pith of Performance**, and kindly contributed to R-bloggers)

In a recent Guerrilla CaP Group discussion, Baron S. wrote:

....

BS> Using gnuplot against the dataset I gave, I get

BS> sigma 0.0207163 +/- 0.001323 (6.385%)

BS> kappa 0.000861226 +/- 5.414e-05 (6.287%)

The Gnuplot output includes the errors for each of the *universal scalability law* (USL) coefficients. A question about the magnitude of these errors also arose in a recent talk I gave. Typically, this question doesn’t come up because there’s more focus on assessing the residual errors as a measure of fit for the USL against the data set. Also, statistical accuracy can be a bigger issue when there are only a small number of samples. Barron reported 32 data points, so that’s not an problem in this case.

All this led me to wonder if the same error information is available in R. The answer is, yes, and here’s how to get hold of it. In the following, where Baron used coefficient names *sigma* and *kappa* (as in Chapter 4 of my GCaP book), I’ll use *alpha* and *beta* to mean the same thing respectively because the context is *processes* (i.e., thread scalability) rather than *processors* (i.e., hardware configuration scalability as discussed in Chap. 4).

First, let’s check the coefficient values determined by R when it fits the nonlinear USL model against Baron’s data:

> coef(usl)

alpha beta

0.0207164433 0.0008612175

They agree. More complete statistics are available in R using the summary() command:

> summary(usl)

Formula: Norm ~ N/(1 + alpha * (N - 1) + beta * N * (N - 1))

Parameters:

Estimate Std. Error t value Pr(>|t|)

alpha 2.072e-02 1.322e-03 15.67 5.48e-16 ***

beta 8.612e-04 5.412e-05 15.91 3.61e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2005 on 30 degrees of freedom

Number of iterations to convergence: 7

Achieved convergence tolerance: 8.628e-08

Indeed, we can now see that the standard errors are reported in the formatted column labeled “Std. Error” but the question is, how can we get hold of those values individually? For that, we need to understand the underlying formatting data structure being used by R. The str() function in R provides that information:

> str(summary(usl))

List of 11

$ formula :Class 'formula' length 3 Norm ~ N/(1 + alpha * (N - 1) + beta * N * (N - 1))

.. ..- attr(*, ".Environment")=

$ residuals : num [1:32] 0 0.011 -0.0522 -0.0144 -0.0269 ...

$ sigma : num 0.2

$ df : int [1:2] 2 30

$ cov.unscaled: num [1:2, 1:2] 4.35e-05 -1.73e-06 -1.73e-06 7.29e-08

..- attr(*, "dimnames")=List of 2

.. ..$ : chr [1:2] "alpha" "beta"

.. ..$ : chr [1:2] "alpha" "beta"

$ call : language nls(formula = Norm ~ N/(1 + alpha * (N - 1) + beta * N * (N - 1)), data = input, start = c(alpha = 0.1, beta = 0.01), algorithm = "default", ...

$ convInfo :List of 5

..$ isConv : logi TRUE

..$ finIter : int 7

..$ finTol : num 8.63e-08

..$ stopCode : int 0

..$ stopMessage: chr "converged"

$ control :List of 5

..$ maxiter : num 50

..$ tol : num 1e-05

..$ minFactor: num 0.000977

..$ printEval: logi FALSE

..$ warnOnly : logi FALSE

$ na.action : NULL

$ coefficients: num [1:2, 1:4] 2.07e-02 8.61e-04 1.32e-03 5.41e-05 1.57e+01 ...

..- attr(*, "dimnames")=List of 2

.. ..$ : chr [1:2] "alpha" "beta"

.. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"

$ parameters : num [1:2, 1:4] 2.07e-02 8.61e-04 1.32e-03 5.41e-05 1.57e+01 ...

..- attr(*, "dimnames")=List of 2

.. ..$ : chr [1:2] "alpha" "beta"

.. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"

- attr(*, "class")= chr "summary.nls"

From this output we see that the α and β coefficients—the same ones reported by the coef() function—can be accessed by indexing them as a 2-dimensional array.

> summary(usl)$coefficients[1,1]

[1] 0.02071644

> summary(usl)$coefficients[2,1]

[1] 0.0008612175

Similarly, the corresponding standard errors can be accessed as:

> summary(usl)$coefficients[1,2]

[1] 0.001322276

> summary(usl)$coefficients[2,2]

[1] 5.412075e-05

and they also agree with the standard errors reported in Gnuplot.

**leave a comment**for the author, please follow the link and comment on their blog:

**The Pith of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...