Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post presents a R code for a k-fold cross validation of Lasso in the case of a gaussian regression (continuous Y). This work easily can be done by using a mean squared error.

# Cross Validation in Lasso : Gaussian Regression

We have implemented a R code for the K-fold cross validation of lasso model with the binomial response in the previous post below.

The main output of this post is the following lasso cross validation figure for the case of a continuous Y variable. (top : cv.glmnet(), bottom : our result).

The difference between the previous (categorical Y) and this (continuous Y) post is the performance measure. The former uses a misclassification rate (mcr) from a confusion matrix but the latter a mean squared error (mse). We have only to modify this performance measure in the previous R code since the same logic is applied except for the performance measure,

In fact, we are familiar with MSE because the linear regression model uses this measure in the textbook level. Since it is easier than the case of binomial response. let’s turn to the R code for this modification directly.

### Cross Validation of Lasso with continuous Y variable

In the following R code, we use a built-in example data (QuickStartExample) for simplicity. In particular, we set arguments family = “gaussian” and type.measure = “mse” for a continuous dependent variable.

 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105 #========================================================## Quantitative ALM, Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee ## https://kiandlee.blogspot.com#——————————————————–## Cross Validation of Lasso : Gaussian Regression#========================================================# library(glmnet)  graphics.off()  # clear all graphsrm(list = ls()) # remove all files from your workspace set.seed(1234) #============================================# data : x and y#============================================data(QuickStartExample) # built-in datanfolds = 5 # number of folds #============================================# cross validation by using cv.glmnet#============================================cvfit <– cv.glmnet(    x, y, family = “gaussian”,     type.measure = “mse” ,    nfolds = nfolds,    keep = TRUE  # returns foldid ) # two lambda from cv.glmnetcvfit$lambda.min; cvfit$lambda.1sex11(); plot(cvfit) #============================================# cross validation by hand#============================================# get a vector of fold id used in cv.glmnet# to replicate the same result.# Therefore, this is subject to the changefoldid <– cvfit$foldid # from glmnet # candidate lambda rangefit <– glmnet(x, y, family = “gaussian”)v.lambda <– fit$lambdanla      <– length(v.lambda)    m.mse <– matrix(0, nrow = nfolds, ncol=nla) #——————————-# iteration over all folds#——————————-for (i in 1:nfolds) {    # training   fold : tr    # validation fold : va        ifd <– which(foldid==i) # i-th fold    tr.x <– x[–ifd,]; tr.y <– y[–ifd]    va.x <– x[ifd,];  va.y <– y[ifd]        # estimation using training fold    fit <– glmnet(tr.x, tr.y, family = “gaussian”,                   lambda = v.lambda)    # prediction on validation fold    prd <– predict(fit, newx = va.x, type = “response”)            # mean squared error for each lambda    for(c in 1:nla) {      m.mse[i,c] <– mean((prd[,c]–va.y)^2)    }}# average msev.mse <– colMeans(m.mse)# save manual cross validation outputcv.out <– data.frame(lambda = v.lambda,     log_lambda = log(v.lambda), mse = v.mse)    #——————————-# lambda.min#——————————-no_lambda_min <– which.min(cv.out$mse)cv.out$lambda[no_lambda_min] #——————————-# lambda.1se#——————————-# standard error of msev.mse_se <– apply(m.mse,2,sd)/sqrt(nfolds)# se of min lambdamse_se_la_min <– v.mse_se[no_lambda_min]# lambda.1semax(cv.out$lambda[ cv.out$mse < min(cv.out$mse) + mse_se_la_min]) #——————————-# graph for cross validation#——————————-x11(); matplot(x = cv.out$log_lambda,     y=cbind(cv.out$mse, cv.out$mse+v.mse_se,                        cv.out$mse–v.mse_se), lty = “solid”, col = c(“blue”,“red”,“green”), type=c(“p”,“l”,“l”), pch = 16, lwd = 3) Colored by Color Scripter cs Running the above R code results in the next two $$\lambda$$s of two approaches (cv.glmnet() and our implementation). Except for the treatment of a mean squared error, calculation of lambda.min and lambda.1se is the same as that of the case of binomial response. Two figures for cross validation are omitted because we have already seen them at the beginning of this blog.  12345678910111213 > #————————————-> # from cv.glmnet()> # cvfit$lambda.min; cvfit\$lambda.1se> #————————————-lambda.min : 0.08307327lambda.1se : 0.1451729 > #————————————-> # from our implementation> #————————————-lambda.min : 0.08307327lambda.1se : 0.1451729 Colored by Color Scripter cs

### Concluding Remarks

In this post, we can easily implement R code for a lasso cross validation with continuous dependent variable by a small modification of the binomial response case. $$\blacksquare$$