# More on the Heteroscedasticity Issue

**Mad (Data) Scientist**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last post, I dsciussed R software, including mine, that handles heteroscedastic settings for linear and nonlinear regression models. Several readers had interesting comments and questions, which I will address here.

To review: Though most books and software assume homoscedasticity, i.e. constancy of the variance of the response variable at all levels of the predictor variables, this condition seldom if ever holds in real life. This can make a mockery of the reported standard errors, and thus render confidence intervals and tests rather meaningless. But fortunately, there is asymptotic theory that gives use valid standard errors in the heteroscedastic case, due to Eickert, White and others, for the linear case. In R, this is available via the functions **vcovHC()** and **hccm()** in the CRAN packages **sandwich** and **car, **respectively.

I mentioned that I had adapted the E-W idea to the nonlinear realm, leveraging the linear-approximation core of the Gauss-Newton method for nonlinear least-squares computation, by calling **vcovHC()** on the linear approximation. The result is the function **nlshc()** in my package **regtool**s. (Unfortunately, I erroneously used its old name, **nlsvcovhc()** in my post.) One simply calls **nlshc()** on an object of class **‘nls’** (output of R’s **nls()** function) to obtain standard errors that are valid under heteroscedasticity.

Achim Zeileis, one of the authors of **sandwich**, then made some comments on my blog post. He pointed out that one can actually call **sandwich()** directly on an **‘nls’** object (again, due to the linear approximation). After reading my post, he tried that, and found that the resulting standard errors were more realistic than what one gets directly from **nls(),** but they still were not as good as mine, where “good” means correct confidence levels for confidence intervals and so on. He speculated that this was due to the particular variant of E-W that I use. Well, here is an update.

First, Achim’s speculation was right. I tried some of the other variants of E-W, and they didn’t do as well. Of course, there may be other factors at work as well.

Upon further investigation, I found another intriguing phenomenon. I tried **nlshc()** on one of the NIST datasets, as follows:

library(nlstools) data(vmkmki) regftn <- function(t,u,b1,b2,b3) b1 * t / (t + b2 * (1 + u/b3)) bstart <- list(b1=1,b2=1, b3=1) z <- nls(v ~ regftn(S,I,b1,b2,b3),data=vmkmki, start=list(b1=1,b2=1, b3=1))

To my surprise, the output was an all-NAs matrix. This did not happen with **hccm()**, This is strange, as **vcovHC()** works fine otherwise, e.g. in the simulation I posted last time.

So, for the time being, in the new **nlshc()** I have replaced the call to **vcovHC()** by a call to **hccm()**. I’ve also added an option for the user to choose a different E-W variant, though given what we know so far, I don’t recommend it.

A couple of commenters asked about using alternatives to **nls()**, which in fact I had done with **nlsLM()**. I had done that because I had had some convergence problems with **nls()**, but actually have reverted to **nls()** in the new version of the package. (I will also make my function an S3 method later on, which I had originally decided not to do, but have changed my mind with Achim’s encouragement 🙂 )

**leave a comment**for the author, please follow the link and comment on their blog:

**Mad (Data) Scientist**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.