Recently, a fishR user asked me the following question:
After fitting the age-length data into VBGM, I overviewed the results. But I can’t find the coefficient of determination () for the VBGM fitting. Because some reviewer want the the coefficient of determination, I have to show it.
In general, the traditional “coefficient of determination” is not defined for non-linear regressions models. To quote Douglas Bates from this R-Help thread …
There is a good reason that an nls model fit in R does not provide
r-squared – r-squared doesn’t make sense for a general nls model.
One way of thinking of r-squared is as a comparison of the residual sum of
squares for the fitted model to the residual sum of squares for a trivial
model that consists of a constant only. You cannot guarantee that this is a
comparison of nested models when dealing with an nls model. If the models
aren’t nested this comparison is not terribly meaningful.
So the answer is that you probably don’t want to do this in the first place.
At this point, I would “argue” with the referee about including an calculation. If the referee only wants a measure of “fit” then I would probably just include a plot of the original data with the best-fit non-linear model super-imposed.
If the referree will not give in on this point then it is possible to exploit the usual definition of to develop what some call “quasi-” values. The code below demonstrates two ways to compute the two quasi- values mentioned in the Stack Exchange thread when applied to a von Bertalanffy growth model.
First, fit the model as demonstrated in the Von Bertalanffy Growth Model (Intro) Vignette by loading the required packages …
… getting the data …
… and fitting the model …
svTypical <- vbStarts(tl~age,data=crm) vbTypical <- tl~Linf*(1-exp(-K*(age-t0))) fitTypical <- nls(vbTypical,data=crm,start=svTypical) summary(fitTypical) ## Formula: tl ~ Linf * (1 - exp(-K * (age - t0))) ## Parameters: ## Estimate Std. Error t value Pr(>|t|) ## Linf 366.414 16.754 21.87 <2e-16 ## K 0.315 0.108 2.92 0.0042 ** ## t0 -1.714 1.049 -1.63 0.1049 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 33.4 on 111 degrees of freedom ## ## Number of iterations to convergence: 4 ## Achieved convergence tolerance: 3.82e-06
One quasi- is computed as the square of the correlation between the actual “y” values (lengths in this case) and the “y” values predicted from the best-fit model. This quasi- is computed as
predtl cor(crm$tl,predtl)^2 ##  0.5338
A second quasi- is computed using the definition of the usual …
where is most easily computed as the sum of the squared residuals from the model fit and , in this case, is best computed as the variance of the original “y” variable times . Thus, this quasi- is computed as
SSE SST 1-SSE/SST ##  0.5338
It is not clear to me that these two calculations will always come out the same as they did here (see some of the comments in the Stack Exchange thread).
Finally, again, I think that the plot is a more reliable summary of the model fit.
fitPlot(fitTypical,xlab="Age",ylab="Total Length (mm)",main="")