**Freakonometrics - Tag - R-english**, and kindly contributed to R-bloggers)

Another post about the R-squared coefficient, and about why, after some years teaching econometrics, I still hate when students ask questions about it. Usually, it starts with “I have a _____ R-squared… isn’t it too low ?” Please, feel free to fill in the blanks with your favorite (low) number. Say 0.2. To make it simple, there are different answers to that question:

- if you don’t want to waste time understanding econometrics, I would say something like “Forget about the R-squared, it is useless” (perhaps also “please, think twice about taking that econometrics course“)
- if you’re ready to spend some time to get a better understanding on subtle concepts, I would say “I don’t like the R-squared. I might be interesting in some rare cases (you can probably count them on the fingers of one finger), like comparing two models on the same dataset (even so, I would recommend the adjusted one). But usually, its values has no meaning. You can compare 0.2 and 0.3 (and prefer the 0.3 R-squared model, rather than the 0.2 R-squared one), but 0.2 means nothing“. Well, not exactly, since it means
*something*, but it is not a measure tjat tells you if you deal with a*good*or a*bad*model. Well, again, not exactly, but it is rather difficult to say where*bad*ends, and where*good*starts. Actually, it is exactly like the correlation coefficient (well, there is nothing mysterious here since the R-squared can be related to some correlation coefficient, as mentioned in class) - if you want some more advanced advice, I would say “It’s complicated…” (and perhaps also “Look in a textbook write by someone more clever than me, you can find hundreds of them in the library !“)
- if you want me to act like people we’ve seen recently on TV (during electoral debate), “It’s extremely interesting, but before answering your question, let me tell you a story…“

> set.seed(1) > n=20 > X=runif(n) > E=rnorm(n) > Y=2+5*X+E*.5 > base=data.frame(X,Y) > reg=lm(Y~X,data=base) > summary(reg) Call: lm(formula = Y ~ X, data = base) Residuals: Min 1Q Median 3Q Max -1.15961 -0.17470 0.08719 0.29409 0.52719 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.4706 0.2297 10.76 2.87e-09 *** X 4.2042 0.3697 11.37 1.19e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.461 on 18 degrees of freedom Multiple R-squared: 0.8778, Adjusted R-squared: 0.871 F-statistic: 129.3 on 1 and 18 DF, p-value: 1.192e-09

> Y=2+5*X+E*4 > base=data.frame(X,Y) > reg=lm(Y~X,data=base) > summary(reg) Call: lm(formula = Y ~ X, data = base) Residuals: Min 1Q Median 3Q Max -9.2769 -1.3976 0.6976 2.3527 4.2175 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.765 1.837 3.138 0.00569 ** X -1.367 2.957 -0.462 0.64953 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.688 on 18 degrees of freedom Multiple R-squared: 0.01173, Adjusted R-squared: -0.04318 F-statistic: 0.2136 on 1 and 18 DF, p-value: 0.6495

> S=seq(0,4,by=.2) > R2=rep(NA,length(S)) > for(s in 1:length(S)){ + Y=2+5*X+E*S[s] + base=data.frame(X,Y) + reg=lm(Y~X,data=base) + R2[s]=summary(reg)$r.squared}

Nevertheless, it looks like some econometricians really care about the R-squared, and cannot imagine looking at a model if the R-squared is lower than – say – 0.4. It is always possible to reach that level ! you just have to add more covariates ! If you have some… And if you don’t, it is always possible to use polynomials of a continuous variate. For instance, on the previous example,

> S=seq(1,25,by=1) > R2=rep(NA,length(S)) > for(s in 1:length(S)){ + reg=lm(Y~poly(X,degree=s),data=base) + R2[s]=summary(reg)$r.squared}

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics - Tag - R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...