On linear models with no constant and R2

February 2, 2012

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

In econometrics course we always say to our students that “if you fit a linear model with no constant, then you might have trouble. For instance, you might have a negative R-squared“. So I tried to find databases on the internet such that, when we compute a linear regression, we actually obtain a negative R squared. I have generated hundreds to random databases that should exhibit such a property, in R. With no success. Perhaps to be more specific, I should explain what might happen if we do not include a constant in a linear model. Consider the following dataset, where points are on a straight line, with a negative slope, far from the origin, symmetric with respect to the first diagonal.

> x=1:3
> y=3:1
> plot(x,y)

Points are on a straight line, so it is actually possible to get a perfect linear model. But only if we integrate a constant in our model. This is related to the fact that the correlation between our two variates is -1,

> cor(x,y)
[1] -1

The least-square program is here


i.e. the estimate of the slope is


Numerically, we obtain

> sum(x*y)/sum(x^2)
[1] 0.7142857

which is the actual slope on the illustration above. If we compute the sum of squares of errors (as a function of the slope), we have here

> ssr=function(b){sum((y-b*x)^2)}
> SSR=Vectorize(ssr)
> B=seq(-1,3,by=.1)
> plot(B,SSR(B),ylim=c(0,ssr(3)),cex=.6,type="b")

so the value we have computed is actually the minimum of the sum of squares of errors. But note that the sum of squares always exceeds the total sum of squares in red on the graph above

> ssr(b)
[1] 6.857143
> sum((y-mean(y))^2)
[1] 2

Recall that the total “coefficient of variation“, denoted http://i0.wp.com/freakonometrics.blog.free.fr/public/perso5/R2.gif?w=456, is defined as



> 1-ssr(b)/sum((y-mean(y))^2)
[1] -2.428571

which is negative. It is also sometimes defined as “the square of the sample correlation coefficient between the outcomes and their predicted values“. Here it would be related to 

> cor(b*x,y)
[1] -1

so we would have a unit http://i0.wp.com/freakonometrics.blog.free.fr/public/perso5/R2.gif?w=456 . So obviously, using the http://i0.wp.com/freakonometrics.blog.free.fr/public/perso5/R2.gif?w=456 in a model without a constant would give odd results. But the weird part is that if we run that regression with R, we get

> summary(lm(y~0+x))
lm(formula = y ~ 0 + x)
1       2       3
2.2857  0.5714 -1.1429
Estimate Std. Error t value Pr(>|t|)
x   0.7143     0.4949   1.443    0.286
Residual standard error: 1.852 on 2 degrees of freedom
Multiple R-squared: 0.5102,	Adjusted R-squared: 0.2653
F-statistic: 2.083 on 1 and 2 DF,  p-value: 0.2857 

Here, the estimation is correct. But the http://i0.wp.com/freakonometrics.blog.free.fr/public/perso5/R2.gif?w=456 we obtain tells us that the model is not that bad… So if anyone knows what R computes, I’d be glad to know. The value given by R (thanks Vincent for asking me to look carefully at the R source code) is obtained using Pythagoras’s theorem to compute the total sum of square,

> sum((b*x)^2)/(sum((b*x)^2)+sum((y-b*x)^2))
[1] 0.5102041

So be careful, the http://freakonometrics.blog.free.fr/public/perso5/R2.gif might look good, but meaningless !

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)