Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Most books on regression analysis assume homoscedasticity, the situation in which Var(Y | X = t), for a response variable Y and vector of predictor variables X, is the same for all t. Yet, needless to say, almost all data in real life is heteroscedastic. For Y = human weight and X = height, say, we know that the assumption of homoscedasticity can’t be true, even approximately.

Typical books discuss assessment of that assumption using residual plots and the like, but then leave it at that. Rather few books mention Eicker-White theory, which develops valid asymptotic inference for heteroscedastic data. E-W is really nice, and guess what! — it’s long been available in R, in the sandwich and car packages on CRAN.  (Note that the latter package is intended for use with a book that does cover this topic, J. Fox and S. Weisberg, An R Companion to Applied Regression, Second Edition, Sage, 2011.) Then instead of using R’s standard vcov() function to obtain estimated variances and covariances of the estimates of the βi, we use vcovHC() and hccm(), respectively.

One can make a similar derivation for nonlinear regression, which is available as the function nlshc() in my regtools package. The package will be associated with my own book on regression, currently in progress. (The package is currently in progress as well, but already contains several useful functions.) The rest of this post is adapted from an example in the book.

The model I chose for this simple example was E(Y | X = t) = 1 / t’β, where the distributions of the quantities can be seen in the following simulation code:

sim <- function(n,nreps) {
b <- 1:2
res <- replicate(nreps,{
x <- matrix(rexp(2*n),ncol=2)
meany <- 1 / (x %*% b)
y <- meany + (runif(n) - 0.5) * meany
xy <- cbind(x,y)
xy <- data.frame(xy)
nlout <- nls(X3 ~ 1 / (b1*X1+b2*X2),
data=xy,start=list(b1 = 1,b2=1))
b <- coef(nlout)
vc <- vcov(nlout)
vchc <- nlsvcovhc(nlout)
z1 <- (b - 1) / sqrt(vc[1,1])
z2 <- (b - 1) / sqrt(vchc[1,1])
c(z1,z2)
})
print(mean(res[1,] < 1.28))
print(mean(res[2,] < 1.28))
}


The results were striking:

> sim(250,2500)
 0.6188
 0.9096


Since the true value should be 0.90 (1.28 is the 0.90 quantile of N(0,1)), the Eicker-White method is doing an outstanding job here — and R’s built-in nonlinear regression function, nls() is not. The latter function’s reported standard errors are way, way off the mark.  