The R-squared and nonlinear regression: a difficult marriage?

Posted on March 24, 2021 by R on The broken bridge between biologists and statisticians in R bloggers | 0 Comments

[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Making sure that a fitted model gives a good description of the observed data is a fundamental step of every nonlinear regression analysis. To this aim we can (and should) use several techniques, either graphical or based on formal hypothesis testing methods. However, in the end, I must admit that I often feel the need of displaying a simple index, based on a single and largely understood value, that reassures the readers about the goodness of fit of my models.

In linear regression, we already have such an index, that is known as the R² or the coefficient of determination. In words, this index represents the proportion of variance in the dependent variable that is explained by the regression effects. It ranges from 0 to 1 and, within this interval, the highest the value, the best the fit. The expression is:

\[ R^2 = \frac{SS_{reg}}{SS_{tot}}\]

and it represents the ratio between the regression sum of squares (\(SS_{reg}\)) and total sum of squares (\(SS_{tot}\)), which is equal to the proportion of explained variance when we consider the population variance, that is obtained by dividing both \(SS_{reg}\) and \(SS_{tot}\) by the number of observations (and not by the number of degrees of freedom). In the above expression, it is:

\[SS_{reg} = \sum_{i = 1}^{n}{\left(\hat{y_i} – \bar{y} \right)^2}\]

and:

\[SS_{tot} = \sum_{i = 1}^{n}{\left(y_i – \bar{y} \right)^2}\]

If we also consider the squared residuals from the regression line, we can also define the residual sum of squares as:

\[SS_{res} = \sum_{i = 1}^{n}{\left(y_i – \hat{y_i} \right)^2}\]

where: \(y_i\) is the i-th observation, \(\hat{y_i}\) is the i-th fitted value and \(\bar{y}\) is the overall mean.

In all linear models with an intercept term, the following equality holds:

\[SS_{tot} = SS_{reg} + SS_{res}\]

Therefore, it is always \(SS_{reg} \leq SS_{tot}\), which implies that the R² value may never be higher than 1 or lower than 0. Furthermore, we can write the alternative (and equivalent) definition:

\[ R^2 = 1 – \frac{SS_{res}}{SS_{tot}}\] Now, the question is:

Can we use the R-squared in nonlinear regression?

Basically, we have two problems:

nonlinear models do not have an intercept term, at least, not in the usual sense;
the equality \(SS_{tot} = SS_{reg} + SS_{res}\) may not hold.

For these reasons, most authors advocate against the use of the R² in nonlinear regression analysis and recommend alternative measures, such as the Mean Squared Error (MSE; see Ratkowsky, 1990) or the AIC and BIC (see Spiess and Neumeyer, 2010). I would argue that the R² may have a superior intuitive appeal as far as it is bound to 1 for a perfectly fitting model; with such a constraint, we can immediately see how good is the fit of our model.

Schabenberger and Pierce (2002) recommend the following statistic, that is similar to the R² for linear models:

\[ \textrm{Pseudo-}R^2 = 1 – \frac{SS_{res}}{SS_{tot}}\]

Why is it a ‘Pseudo-R²’?. In contrast to what happens with linear models, this statistic:

cannot exceed 1, but it may lower than 0;
it cannot be interpreted as the proportion of variance explained by the model

Bearing these two limitations in mind, there is no reason why we should not use such a goodness-of-fit measure with nonlinear regression. In this line, the R2.nls() function in the ‘aomisc’ package can be used to retrieve the R² and Pseudo-R² values from a nonlinear model fitted with the nls() and drm() functions.

library(aomisc)
X <- c(0.1, 5, 7, 22, 28, 39, 46, 200)
Y <- c(1, 13.66, 14.11, 14.43, 14.78, 14.86, 14.78, 14.91)

#drm fit
model <- drm(Y ~ X, fct = MM.2())
R2nls(model)$PseudoR2
## [1] 0.9930399
#
# nls fit
model2 <- nls(Y ~ SSmicmen(X, Vm, K))
R2nls(model)$PseudoR2
## [1] 0.9930399

Undoubtedly, the Pseudo-R² gives, at first glance, a good feel for the quality of our regressions; but, please, do not abuse it. In particular, please, remember that a negative value might indicate a big problem with the fitted model. Above all, remember that the Pseudo-R², similarly to the R² in multiple linear regression, should never be used as the basis to select and compare alternative regression model. Other statistics should be used to this aim.

Thanks for reading and happy coding!

Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
[email protected]

References

Ratkowsky, D.A., 1990. Handbook of nonlinear regression models. Marcel Dekker Inc., Books.
Schabenberger, O., Pierce, F.J., 2002. Contemporary statistical models for the plant and soil sciences. Taylor & Francis, CRC Press, Books.
Spiess, A. N., & Neumeyer, N., 2010. An evaluation of \(R^2\) as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach. BMC Pharmacology, 10, 6.

To leave a comment for the author, please follow the link and comment on their blog: R on The broken bridge between biologists and statisticians.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

The R-squared and nonlinear regression: a difficult marriage?

Can we use the R-squared in nonlinear regression?

References

Related

Can we use the R-squared in nonlinear regression?

References

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)