Ridge Regression and the Lasso

[This article was first published on R – Real Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last post Which linear model is best? I wrote about using
stepwise selection as a method for selecting linear models, which turns
out to have some issues (see this article, and Wikipedia).
This post will be about two methods that slightly modify ordinary least
squares (OLS) regression – ridge regression and the lasso.

Ridge regression and the lasso are closely related, but only the Lasso
has the ability to select predictors. Like OLS, ridge attempts to
minimize residual sum of squares of predictors in a given model.
However, ridge regression includes an additional ‘shrinkage’ term – the
square of the coefficient estimate – which shrinks the estimate of the
coefficients towards zero. The impact of this term is controlled by
another term, lambda (determined seperately). Two interesting
implications of this design are the facts that when λ = 0 the OLS
coefficients are returned and when λ = ∞, coefficients will approach

To take a look at this, setup a model matrix (removing the intercept
column), store the independent variable as y, and create a vector of
lambda values.

swiss <- datasets::swiss
x <- model.matrix(Fertility~., swiss)[,-1]
y <- swiss$Fertility
lambda <- 10^seq(10, -2, length = 100)

First, let's prove the fact that when λ = 0 we get the same coefficients
as the OLS model.

#create test and training sets

## Loading required package: Matrix

## Loading required package: foreach

## Loaded glmnet 2.0-10

train = sample(1:nrow(x), nrow(x)/2)
test = (-train)
ytest = y[test]

Fit your models.

swisslm <- lm(Fertility~., data = swiss)

##      (Intercept)      Agriculture      Examination        Education 
##       66.9151817       -0.1721140       -0.2580082       -0.8709401 
##         Catholic Infant.Mortality 
##        0.1041153        1.0770481

ridge.mod <- glmnet(x, y, alpha = 0, lambda = lambda)
predict(ridge.mod, s = 0, exact = T, type = 'coefficients')[1:6,]

##      (Intercept)      Agriculture      Examination        Education 
##       66.9365901       -0.1721983       -0.2590771       -0.8705300 
##         Catholic Infant.Mortality 
##        0.1040307        1.0770215

The differences here are nominal. Let's see if we can use ridge to
improve on the OLS estimate.

swisslm <- lm(Fertility~., data = swiss, subset = train)
ridge.mod <- glmnet(x[train,], y[train], alpha = 0, lambda = lambda)
#find the best lambda from our list via cross-validation
cv.out <- cv.glmnet(x[train,], y[train], alpha = 0)

## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations
## per fold

bestlam <- cv.out$lambda.min

#make predictions
ridge.pred <- predict(ridge.mod, s = bestlam, newx = x[test,])
s.pred <- predict(swisslm, newdata = swiss[test,])
#check MSE

## [1] 106.0087


## [1] 93.02157

Ridge performs better for this data according to the MSE.

#a look at the coefficients
out = glmnet(x[train,],y[train],alpha = 0)
predict(ridge.mod, type = "coefficients", s = bestlam)[1:6,]

##      (Intercept)      Agriculture      Examination        Education 
##      64.90631178      -0.16557837      -0.59425090      -0.35814759 
##         Catholic Infant.Mortality 
##       0.06545382       1.30375306

As expected, most of the coefficient estimates are more conservative.

Let's have a look at the lasso. The big difference here is in the
shrinkage term – the lasso takes the absolute value of the coefficient

lasso.mod <- glmnet(x[train,], y[train], alpha = 1, lambda = lambda)
lasso.pred <- predict(lasso.mod, s = bestlam, newx = x[test,])

## [1] 124.1039

The MSE is a bit higher for the lasso estimate. Let's check out the

lasso.coef  <- predict(lasso.mod, type = 'coefficients', s = bestlam)[1:6,]

Looks like the lasso places high importance on Education,
Examination, and Infant.Mortality. From this we also gain some
evidence that Catholic and Agriculture are not useful predictors for
this model. It is likely that Catholic and Agriculture do have some
effect on Fertility, though, since pushing those coefficients to zero
hurt the model.

There is plenty more to delve into here, but I'll leave the details to
the experts. I am always happy to have your take on the topics I write
about, so please feel free to leave a comment or contact me. Oftentimes
I learn just as much from you all as I do in researching the topic I
write about.

I think the next post will be about more GIS stuff – maybe on rasters or
point pattern analysis.

Thank you for reading!


To leave a comment for the author, please follow the link and comment on their blog: R – Real Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)