In my last post Which linear model is best? I wrote about using
stepwise selection as a method for selecting linear models, which turns
out to have some issues (see this article, and Wikipedia).
This post will be about two methods that slightly modify ordinary least
squares (OLS) regression – ridge regression and the lasso.
Ridge regression and the lasso are closely related, but only the Lasso
has the ability to select predictors. Like OLS, ridge attempts to
minimize residual sum of squares of predictors in a given model.
However, ridge regression includes an additional ‘shrinkage’ term – the
square of the coefficient estimate – which shrinks the estimate of the
coefficients towards zero. The impact of this term is controlled by
another term, lambda (determined seperately). Two interesting
implications of this design are the facts that when λ = 0 the OLS
coefficients are returned and when λ = ∞, coefficients will approach
To take a look at this, setup a model matrix (removing the intercept
column), store the independent variable as
y, and create a vector of
swiss <- datasets::swiss x <- model.matrix(Fertility~., swiss)[,-1] y <- swiss$Fertility lambda <- 10^seq(10, -2, length = 100)
First, let's prove the fact that when λ = 0 we get the same coefficients
as the OLS model.
#create test and training sets library(glmnet)
## Loading required package: Matrix ## Loading required package: foreach ## Loaded glmnet 2.0-10
set.seed(489) train = sample(1:nrow(x), nrow(x)/2) test = (-train) ytest = y[test]
Fit your models.
#OLS swisslm <- lm(Fertility~., data = swiss) coef(swisslm)
## (Intercept) Agriculture Examination Education ## 66.9151817 -0.1721140 -0.2580082 -0.8709401 ## Catholic Infant.Mortality ## 0.1041153 1.0770481
#ridge ridge.mod <- glmnet(x, y, alpha = 0, lambda = lambda) predict(ridge.mod, s = 0, exact = T, type = 'coefficients')[1:6,]
## (Intercept) Agriculture Examination Education ## 66.9365901 -0.1721983 -0.2590771 -0.8705300 ## Catholic Infant.Mortality ## 0.1040307 1.0770215
The differences here are nominal. Let's see if we can use ridge to
improve on the OLS estimate.
swisslm <- lm(Fertility~., data = swiss, subset = train) ridge.mod <- glmnet(x[train,], y[train], alpha = 0, lambda = lambda) #find the best lambda from our list via cross-validation cv.out <- cv.glmnet(x[train,], y[train], alpha = 0)
## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations ## per fold
bestlam <- cv.out$lambda.min
#make predictions ridge.pred <- predict(ridge.mod, s = bestlam, newx = x[test,]) s.pred <- predict(swisslm, newdata = swiss[test,]) #check MSE mean((s.pred-ytest)^2)
##  106.0087
##  93.02157
Ridge performs better for this data according to the MSE.
#a look at the coefficients out = glmnet(x[train,],y[train],alpha = 0) predict(ridge.mod, type = "coefficients", s = bestlam)[1:6,]
## (Intercept) Agriculture Examination Education ## 64.90631178 -0.16557837 -0.59425090 -0.35814759 ## Catholic Infant.Mortality ## 0.06545382 1.30375306
As expected, most of the coefficient estimates are more conservative.
Let's have a look at the lasso. The big difference here is in the
shrinkage term – the lasso takes the absolute value of the coefficient
lasso.mod <- glmnet(x[train,], y[train], alpha = 1, lambda = lambda) lasso.pred <- predict(lasso.mod, s = bestlam, newx = x[test,]) mean((lasso.pred-ytest)^2)
##  124.1039
The MSE is a bit higher for the lasso estimate. Let's check out the
lasso.coef <- predict(lasso.mod, type = 'coefficients', s = bestlam)[1:6,]
Looks like the lasso places high importance on
Infant.Mortality. From this we also gain some
Agriculture are not useful predictors for
this model. It is likely that
Agriculture do have some
Fertility, though, since pushing those coefficients to zero
hurt the model.
There is plenty more to delve into here, but I'll leave the details to
the experts. I am always happy to have your take on the topics I write
about, so please feel free to leave a comment or contact me. Oftentimes
I learn just as much from you all as I do in researching the topic I
I think the next post will be about more GIS stuff – maybe on rasters or
point pattern analysis.
Thank you for reading!