LASSO, adaLASSO and the GLMNET package

[This article was first published on R – insightR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By Gabriel Vasconcelos

Motivation

If you are close to the data science world you probably heard about LASSO. It stands for Least Absolute Shrinkage and Selection Operator. The LASSO is a model that uses a penalization on the size of the parameters in the objective function to try to exclude irrelevant variables from the model. It has two very natural uses, the first is variable selection, and the second is forecasting. Since normally the LASSO will select much less variables than Ordinary Least Squares (OLS), its forecast will have much less variance at the cost of a small amount of bias in sample.

One of the most important features of the LASSO is that in can deal with much more variables than observations, I am talking about thousands of variables. This is one of the main reasons for its recent popularity. Only in the last 6 days five related packages were published in CRAN (April 1-6).

Example

In this example I am going to use one of the most popular LASSO packages, the glmnet. It allows us to estimate the LASSO very fast and select the best model using cross-validation. In my experience, especially in a time-series context, it is better to select the best model using information criterion such as the BIC. It is faster and avoids some complications of cross-validation in time-series.

The package HDeconometrics (under development on GitHub) uses the glmnet package to estimate the LASSO and selects the best model using an information criterion chosen by the user. The data we are going to use is also available in the package. This data was used by Garcia, Medeiros and Vasconcelos (2017). We are going to use the LASSO to forecast the Brazilian inflation.

#library(devtools)
#install_github("gabrielrvsc/HDeconometrics")
library(HDeconometrics)
data("BRinf")
data=embed(BRinf,2)
y=data[,1]; x=data[,-c(1:ncol(BRinf))]

## == Break the data into in-sample and out-of-sample
y.in=y[1:100]; y.out=y[-c(1:100)]
x.in=x[1:100,]; x.out=x[-c(1:100),]

## == LASSO == ##
lasso=ic.glmnet(x.in,y.in,crit = "bic")
plot(lasso$glmnet,"lambda",ylim=c(-2,2))

plot of chunk unnamed-chunk-93

plot(lasso)

plot of chunk unnamed-chunk-93

The first plot above shows the variables going to zero as we increase the penalty in the objective function of the LASSO. The Second plot shows the BIC curve and the selected model. Now we can calculate the forecast:

## == Forecasting == ##
pred.lasso=predict(lasso,newdata=x.out)
plot(y.out, type="l")
lines(pred.lasso, col=2)

plot of chunk unnamed-chunk-94

adaptive LASSO

The LASSO has an adaptive version that has some better properties regarding variable selection. Note that this does not always means better forecast. The idea behind the model is to use some previously know information to select the variables more efficiently. This information is, in general, the coefficients estimated by LASSO or some other model.

## = adaLASSO = ##
tau=1
first.step.coef=coef(lasso)[-1]
penalty.factor=abs(first.step.coef+1/sqrt(nrow(x)))^(-tau)
adalasso=ic.glmnet(x.in,y.in,crit="bic",penalty.factor=penalty.factor)
pred.adalasso=predict(adalasso,newdata=x.out)

plot(y.out, type="l")
lines(pred.lasso, col=2)
lines(pred.adalasso, col=4)

plot of chunk unnamed-chunk-95

## = comparing the errors = ##
c(LASSO=sqrt(mean((y.out-pred.lasso)^2)), adaLASSO=sqrt(mean((y.out-pred.adalasso)^2)))

##     LASSO  adaLASSO 
## 0.1810612 0.1678397

The adaLASSO produced a more precise forecast in this case. In general, the adaLASSO is better than the simple LASSO for forecasting. However, this is not an absolute true. I have seen many cases where the simple LASSO did better.

More information

If you are interested in going deeper, here are some suggestions:

[1] Bühlmann, Peter, and Sara Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.

[2] Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for
Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL http://www.jstatsoft.org/v33/i01/

[3] Marcio Garcia, Marcelo C. Medeiros , Gabriel F. R. Vasconcelos (2017). Real-time inflation forecasting with high-dimensional models: The case of Brazil. Internationnal Journal of Forecasting, in press.


To leave a comment for the author, please follow the link and comment on their blog: R – insightR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)