[This article was first published on Rmazing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Initial Remark: Reload this page if formulas don’t display well!
As promised, here is the second part on how to obtain confidence intervals for fitted values obtained from nonlinear regression via nls or nlsLM (package ‘minpack.lm’).
I covered a Monte Carlo approach in http://rmazing.wordpress.com/2013/08/14/predictnls-part-1-monte-carlo-simulation-confidence-intervals-for-nls-models/, but here we will take a different approach: First- and second-order Taylor approximation around $f(x)$: $f(x) \approx f(a)+\frac {f'(a)}{1!} (x-a)+ \frac{f''(a)}{2!} (x-a)^2$.
Using Taylor approximation for calculating confidence intervals is a matter of propagating the uncertainties of the parameter estimates obtained from vcov(model) to the fitted value. When using first-order Taylor approximation, this is also known as the “Delta method”. Those familiar with error propagation will know the formula
$\displaystyle \sum_{i=1}^2 \rm{j_i}^2 \sigma_i^2 + 2\sum_{i=1\atop i \neq k}^n\sum_{k=1\atop k \neq i}^n \rm{j_i j_k} \sigma_{ik}$.
Heavily underused is the matrix notation of the famous formula above, for which a good derivation can be found at http://www.nada.kth.se/~kai-a/papers/arrasTR-9801-R3.pdf:
$\sigma_y^2 = \nabla_x\mathbf{C}_x\nabla_x^T$,
where $\nabla_x$ is the gradient vector of first-order partial derivatives and $\mathbf{C}_x$ is the variance-covariance matrix. This formula corresponds to the first-order Taylor approximation. Now the problem with first-order approximations is that they assume linearity around $f(x)$. Using the “Delta method” for nonlinear confidence intervals in R has been discussed in http://thebiobucket.blogspot.de/2011/04/fit-sigmoid-curve-with-confidence.html or http://finzi.psych.upenn.edu/R/Rhelp02a/archive/42932.html.
For highly nonlinear functions we need (at least) a second-order polynomial around $f(x)$ to realistically estimate the surrounding interval (red is linear approximation, blue is second-order polynomial on a sine function around $x = 5$):

Interestingly, there are also matrix-like notations for the second-order mean and variance in the literature (see http://dml.cz/dmlcz/141418 or http://iopscience.iop.org/0026-1394/44/3/012/pdf/0026-1394_44_3_012.pdf):
Second-order mean: $\rm{E}[y] = f(\bar{x}_i) + \frac{1}{2}\rm{tr}(\mathbf{H}_{xx}\mathbf{C}_x)$.
Second-order variance: $\sigma_y^2 = \nabla_x\mathbf{C}_x\nabla_x^T + \frac{1}{2}\rm{tr}(\mathbf{H}_{xx}\mathbf{C}_x\mathbf{H}_{xx}\mathbf{C}_x)$,
where $\mathbf{H}_{xx}$ is the Hessian matrix of second-order partial derivatives and $tr(\cdot)$ is the matrix trace (sum of diagonals).

Enough theory, for wrapping this all up we need three utility functions:
1) numGrad for calculating numerical first-order partial derivatives.

numGrad <- function(expr, envir = .GlobalEnv)
{
f0 <- eval(expr, envir)
vars <- all.vars(expr)
p <- length(vars)
x <- sapply(vars, function(a) get(a, envir))
eps <- 1e-04
d <- 0.1
r <- 4
v <- 2
zero.tol <- sqrt(.Machine$double.eps/7e-07) h0 <- abs(d * x) + eps * (abs(x) < zero.tol) D <- matrix(0, length(f0), p) Daprox <- matrix(0, length(f0), r) for (i in 1:p) { h <- h0 for (k in 1:r) { x1 <- x2 <- x x1 <- x1 + (i == (1:p)) * h f1 <- eval(expr, as.list(x1)) x2 <- x2 - (i == (1:p)) * h f2 <- eval(expr, envir = as.list(x2)) Daprox[, k] <- (f1 - f2)/(2 * h[i]) h <- h/v } for (m in 1:(r - 1)) for (k in 1:(r - m)) { Daprox[, k] <- (Daprox[, k + 1] * (4^m) - Daprox[, k])/(4^m - 1) } D[, i] <- Daprox[, 1] } return(D) }  2) numHess for calculating numerical second-order partial derivatives. numHess <- function(expr, envir = .GlobalEnv) { f0 <- eval(expr, envir) vars <- all.vars(expr) p <- length(vars) x <- sapply(vars, function(a) get(a, envir)) eps <- 1e-04 d <- 0.1 r <- 4 v <- 2 zero.tol <- sqrt(.Machine$double.eps/7e-07)
h0 <- abs(d * x) + eps * (abs(x) < zero.tol)
Daprox <- matrix(0, length(f0), r)
Hdiag <- matrix(0, length(f0), p)
Haprox <- matrix(0, length(f0), r)
H <- matrix(NA, p, p)
for (i in 1:p) {
h <- h0
for (k in 1:r) {
x1 <- x2 <- x
x1 <- x1 + (i == (1:p)) * h
f1 <- eval(expr, as.list(x1))
x2 <- x2 - (i == (1:p)) * h
f2 <- eval(expr, envir = as.list(x2))
Haprox[, k] <- (f1 - 2 * f0 + f2)/h[i]^2
h <- h/v
}
for (m in 1:(r - 1)) for (k in 1:(r - m)) {
Haprox[, k] <- (Haprox[, k + 1] * (4^m) - Haprox[, k])/(4^m - 1)
}
Hdiag[, i] <- Haprox[, 1]
}
for (i in 1:p) {
for (j in 1:i) {
if (i == j) {
H[i, j] <- Hdiag[, i]
}
else {
h <- h0
for (k in 1:r) {
x1 <- x2 <- x
x1 <- x1 + (i == (1:p)) * h + (j == (1:p)) *
h
f1 <- eval(expr, as.list(x1))
x2 <- x2 - (i == (1:p)) * h - (j == (1:p)) *
h
f2 <- eval(expr, envir = as.list(x2))
Daprox[, k] <- (f1 - 2 * f0 + f2 - Hdiag[, i] * h[i]^2 - Hdiag[, j] * h[j]^2)/(2 * h[i] * h[j])
h <- h/v
}
for (m in 1:(r - 1)) for (k in 1:(r - m)) {
Daprox[, k] <- (Daprox[, k + 1] * (4^m) - Daprox[, k])/(4^m - 1)
}
H[i, j] <- H[j, i] <- Daprox[, 1]
}
}
}
return(H)
}


And a small function for the matrix trace:

tr <- function(mat) sum(diag(mat), na.rm = TRUE)


1) and 2) are modified versions of the genD function in the “numDeriv” package that can handle expressions.

Now we need the predictNLS function that wraps it all up:

predictNLS <- function(
object,
newdata,
interval = c("none", "confidence", "prediction"),
level = 0.95,
...
)
{
require(MASS, quietly = TRUE)
interval <- match.arg(interval)

## get right-hand side of formula
RHS <- as.list(object$call$formula)[[3]]
EXPR <- as.expression(RHS)

## all variables in model
VARS <- all.vars(EXPR)

## coefficients
COEF <- coef(object)

## extract predictor variable
predNAME <- setdiff(VARS, names(COEF))

## take fitted values, if 'newdata' is missing
if (missing(newdata)) {
newdata <- eval(object\$data)[predNAME]
colnames(newdata) <- predNAME
}

## check that 'newdata' has same name as predVAR
if (names(newdata)[1] != predNAME) stop("newdata should have name '", predNAME, "'!")

## get parameter coefficients
COEF <- coef(object)

## get variance-covariance matrix
VCOV <- vcov(object)

## augment variance-covariance matrix for 'mvrnorm'
## by adding a column/row for 'error in x'
NCOL <- ncol(VCOV)
ADD1 <- c(rep(0, NCOL))
VCOV <- cbind(VCOV, ADD1)
ADD2 <- c(rep(0, NCOL + 1))
VCOV <- rbind(VCOV, ADD2)

NR <- nrow(newdata)
respVEC <- numeric(NR)
seVEC <- numeric(NR)
varPLACE <- ncol(VCOV)

outMAT <- NULL

## define counter function
counter <- function (i)
{
if (i%%10 == 0)
cat(i)
else cat(".")
if (i%%50 == 0)
cat("\n")
flush.console()
}

## calculate residual variance
r <- residuals(object)
w <- weights(object)
rss <- sum(if (is.null(w)) r^2 else r^2 * w)
df <- df.residual(object)

## iterate over all entries in 'newdata' as in usual 'predict.' functions
for (i in 1:NR) {
counter(i)

## get predictor values and optional errors
predVAL <- newdata[i, 1]
if (ncol(newdata) == 2) predERROR <- newdata[i, 2] else predERROR <- 0
names(predVAL) <- predNAME
names(predERROR) <- predNAME

## create mean vector
meanVAL <- c(COEF, predVAL)

## create augmented variance-covariance matrix
## by putting error^2 in lower-right position of VCOV
newVCOV <- VCOV
newVCOV[varPLACE, varPLACE] <- predERROR^2
SIGMA <- newVCOV

## first-order mean: eval(EXPR), first-order variance: G.S.t(G)
MEAN1 <- try(eval(EXPR, envir = as.list(meanVAL)), silent = TRUE)
if (inherits(MEAN1, "try-error")) stop("There was an error in evaluating the first-order mean!")
GRAD <- try(numGrad(EXPR, as.list(meanVAL)), silent = TRUE)
if (inherits(GRAD, "try-error")) stop("There was an error in creating the numeric gradient!")
VAR1 <- GRAD %*% SIGMA %*% matrix(GRAD)

## second-order mean: firstMEAN + 0.5 * tr(H.S),
## second-order variance: firstVAR + 0.5 * tr(H.S.H.S)
HESS <- try(numHess(EXPR, as.list(meanVAL)), silent = TRUE)
if (inherits(HESS, "try-error")) stop("There was an error in creating the numeric Hessian!")

valMEAN2 <- 0.5 * tr(HESS %*% SIGMA)
valVAR2 <- 0.5 * tr(HESS %*% SIGMA %*% HESS %*% SIGMA)

MEAN2 <- MEAN1 + valMEAN2
VAR2 <- VAR1 + valVAR2

## confidence or prediction interval
if (interval != "none") {
tfrac <- abs(qt((1 - level)/2, df))
INTERVAL <-  tfrac * switch(interval, confidence = sqrt(VAR2),
prediction = sqrt(VAR2 + res.var))
LOWER <- MEAN2 - INTERVAL
UPPER <- MEAN2 + INTERVAL
names(LOWER) <- paste((1 - level)/2 * 100, "%", sep = "")
names(UPPER) <- paste((1 - (1- level)/2) * 100, "%", sep = "")
} else {
LOWER <- NULL
UPPER <- NULL
}

RES <- c(mu.1 = MEAN1, mu.2 = MEAN2, sd.1 = sqrt(VAR1), sd.2 = sqrt(VAR2), LOWER, UPPER)
outMAT <- rbind(outMAT, RES)
}

cat("\n")
rownames(outMAT) <- NULL
return(outMAT)
}


With all functions at hand, we can now got through the same example as used in the Monte Carlo post:
 DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1) > predictNLS(fm1DNase1, newdata = data.frame(conc = 5), interval = "confidence") . mu.1 mu.2 sd.1 sd.2 2.5% 97.5% [1,] 1.243631 1.243288 0.03620415 0.03620833 1.165064 1.321511 

The errors/confidence intervals are larger than with the MC approch (who knows why?) but it is very interesting to see how close the second-order corrected mean (1.243288) comes to the mean of the simulated values from the Monte Carlo approach (1.243293)!

The two approach (MC/Taylor) will be found in the predictNLS function that will be part of the “propagate” package in a few days at CRAN…

Cheers,
Andrej

Filed under: General, R Internals Tagged: confidence interval, first-order, fitting, Monte Carlo, nls, nonlinear, predict, second-order, Taylor approximation