Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last post Which linear model is best? I wrote about using
stepwise selection as a method for selecting linear models, which turns
This post will be about two methods that slightly modify ordinary least
squares (OLS) regression – ridge regression and the lasso.

Ridge regression and the lasso are closely related, but only the Lasso
has the ability to select predictors. Like OLS, ridge attempts to
minimize residual sum of squares of predictors in a given model.
However, ridge regression includes an additional ‘shrinkage’ term – the
square of the coefficient estimate – which shrinks the estimate of the
coefficients towards zero. The impact of this term is controlled by
another term, lambda (determined seperately). Two interesting
implications of this design are the facts that when λ = 0 the OLS
coefficients are returned and when λ = ∞, coefficients will approach
zero.

To take a look at this, setup a model matrix (removing the intercept
column), store the independent variable as y, and create a vector of
lambda values.

swiss <- datasets::swiss
x <- model.matrix(Fertility~., swiss)[,-1]
y <- swiss$Fertility lambda <- 10^seq(10, -2, length = 100)  First, let's prove the fact that when λ = 0 we get the same coefficients as the OLS model. #create test and training sets library(glmnet) ## Loading required package: Matrix ## Loading required package: foreach ## Loaded glmnet 2.0-10 set.seed(489) train = sample(1:nrow(x), nrow(x)/2) test = (-train) ytest = y[test]  Fit your models. #OLS swisslm <- lm(Fertility~., data = swiss) coef(swisslm) ## (Intercept) Agriculture Examination Education ## 66.9151817 -0.1721140 -0.2580082 -0.8709401 ## Catholic Infant.Mortality ## 0.1041153 1.0770481 #ridge ridge.mod <- glmnet(x, y, alpha = 0, lambda = lambda) predict(ridge.mod, s = 0, exact = T, type = 'coefficients')[1:6,] ## (Intercept) Agriculture Examination Education ## 66.9365901 -0.1721983 -0.2590771 -0.8705300 ## Catholic Infant.Mortality ## 0.1040307 1.0770215  The differences here are nominal. Let's see if we can use ridge to improve on the OLS estimate. swisslm <- lm(Fertility~., data = swiss, subset = train) ridge.mod <- glmnet(x[train,], y[train], alpha = 0, lambda = lambda) #find the best lambda from our list via cross-validation cv.out <- cv.glmnet(x[train,], y[train], alpha = 0) ## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations ## per fold bestlam <- cv.out$lambda.min

#make predictions
ridge.pred <- predict(ridge.mod, s = bestlam, newx = x[test,])
s.pred <- predict(swisslm, newdata = swiss[test,])
#check MSE
mean((s.pred-ytest)^2)

## [1] 106.0087

mean((ridge.pred-ytest)^2)

## [1] 93.02157


Ridge performs better for this data according to the MSE.

#a look at the coefficients
out = glmnet(x[train,],y[train],alpha = 0)
predict(ridge.mod, type = "coefficients", s = bestlam)[1:6,]

##      (Intercept)      Agriculture      Examination        Education
##      64.90631178      -0.16557837      -0.59425090      -0.35814759
##         Catholic Infant.Mortality
##       0.06545382       1.30375306


As expected, most of the coefficient estimates are more conservative.

Let's have a look at the lasso. The big difference here is in the
shrinkage term – the lasso takes the absolute value of the coefficient
estimates.

lasso.mod <- glmnet(x[train,], y[train], alpha = 1, lambda = lambda)
lasso.pred <- predict(lasso.mod, s = bestlam, newx = x[test,])
mean((lasso.pred-ytest)^2)

## [1] 124.1039


The MSE is a bit higher for the lasso estimate. Let's check out the
coefficients.

lasso.coef  <- predict(lasso.mod, type = 'coefficients', s = bestlam)[1:6,]


Looks like the lasso places high importance on Education,
Examination, and Infant.Mortality. From this we also gain some
evidence that Catholic and Agriculture are not useful predictors for
this model. It is likely that Catholic and Agriculture do have some
effect on Fertility, though, since pushing those coefficients to zero
hurt the model.

There is plenty more to delve into here, but I'll leave the details to
the experts. I am always happy to have your take on the topics I write
I learn just as much from you all as I do in researching the topic I

I think the next post will be about more GIS stuff – maybe on rasters or
point pattern analysis.