**Chen-ang Statistics » R**, and kindly contributed to R-bloggers)

Continue to discuss this topic about multicollinearity in regression. Firstly, it is necessary introduce how to calculate the VIF and condition number via software such as R. Of course it is really easy for us. The vif() in car and kappa() can be applied to calculate the VIF and condition number, respectively. Consider the data from the last article of this series for example

> #vif > vif(lm(GNP~.,data=longley)); GNP.deflator Unemployed Armed.Forces Population Year 81.946226 35.924858 9.406108 171.158675 1017.609561 Employed 196.247880 > #condition number > kappa(longley[,-1]); [1] 8521.126

From the output, it is clear that both of VIF and condition number are extremely large which means the data exist extremely multicollinearity.

**2 Lasso and Least Angle Regression**

Besides ridge regression, lasso is another feasible and straightforward way. Lasso is the abbreviation of *Least Absolute Shrinkage and Selection Operator *and actually the motivation is similar with ridge regression.

The main difference between ridge regression and lasso is that the former uses a (squared) penalty, while the latter uses an penalty. Due to this difference, their solutions behave very differently. For the sake of implement lasso regression in R language, we consider package lars which provides Efficient procedures for fitting. The following 3 functions in lars are particularly useful:

(1) lars(): Fits Least Angle Regression(will be mentioned later), Lasso and Inﬁnitesimal Forward Stage-wise regression models.

(2) cv.lars(): Computes K-fold cross-validated error curve for lars

(3) plot.lars(): Plot method for lars objects

We still use the data which have been demonstrated by the last article, and please run the code as below and depend on the output, the model can be built easily

library(lars); #lars is only used for matrix y<-matrix(longley[,1]); x<-as.matrix(longley[,-1]); lasso<-lars(x,y); plot(lasso); summary(lasso); cvr<-cv.lars(x,y,K=10); best<-cvr$index[which.min(cvr$cv)]; coef0<-coef.lars(lasso,mode="fraction",s=best); s<-which.min(lasso$Cp)[1] coef1<-coef.lars(lasso,mode="step",s=s);

Besides, least angle regression is a possible method as well, which also can implemented by function lars(), but the argument *type* is supposed to adjust as “lar”.

lar<-lars(x,y,type="lar"); plot(lar); summary(lar);

Note that the computation of the lasso solutions is a quadratic programming problem, and can be tackled by standard numerical analysis algorithms, whereas, the least angle regression procedure is a better approach.

** **

**leave a comment**for the author, please follow the link and comment on their blog:

**Chen-ang Statistics » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...