lambda.min, lambda.1se and Cross Validation in Lasso : Continuous Response

[This article was first published on K & L Fintech Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post presents a R code for a k-fold cross validation of Lasso in the case of a gaussian regression (continuous Y). This work easily can be done by using a mean squared error.




Cross Validation in Lasso : Gaussian Regression



We have implemented a R code for the K-fold cross validation of lasso model with the binomial response in the previous post below.


The main output of this post is the following lasso cross validation figure for the case of a continuous Y variable. (top : cv.glmnet(), bottom : our result).

lambda.min, lambda.1se and Cross Validation in Lasso : Gaussian Regression
The difference between the previous (categorical Y) and this (continuous Y) post is the performance measure. The former uses a misclassification rate (mcr) from a confusion matrix but the latter a mean squared error (mse). We have only to modify this performance measure in the previous R code since the same logic is applied except for the performance measure,

In fact, we are familiar with MSE because the linear regression model uses this measure in the textbook level. Since it is easier than the case of binomial response. let’s turn to the R code for this modification directly.



Cross Validation of Lasso with continuous Y variable


In the following R code, we use a built-in example data (QuickStartExample) for simplicity. In particular, we set arguments family = “gaussian” and type.measure = “mse” for a continuous dependent variable.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#========================================================#
# Quantitative ALM, Financial Econometrics & Derivatives 
# ML/DL using R, Python, Tensorflow by Sang-Heon Lee 
#
# https://kiandlee.blogspot.com
#——————————————————–#
# Cross Validation of Lasso : Gaussian Regression
#========================================================#
 
library(glmnet) 
 
graphics.off()  # clear all graphs
rm(list = ls()) # remove all files from your workspace
 
set.seed(1234)
 
#============================================
# data : x and y
#============================================
data(QuickStartExample) # built-in data
nfolds = 5 # number of folds
 
#============================================
# cross validation by using cv.glmnet
#============================================
cvfit < cv.glmnet(
    x, y, family = “gaussian”
    type.measure = “mse” ,
    nfolds = nfolds,
    keep = TRUE  # returns foldid 
)
 
# two lambda from cv.glmnet
cvfit$lambda.min; cvfit$lambda.1se
x11(); plot(cvfit)
 
#============================================
# cross validation by hand
#============================================
# get a vector of fold id used in cv.glmnet
# to replicate the same result.
# Therefore, this is subject to the change
foldid < cvfit$foldid # from glmnet
 
# candidate lambda range
fit      < glmnet(x, y, family = “gaussian”)
v.lambda < fit$lambda
nla      < length(v.lambda)
    
m.mse < matrix(0, nrow = nfolds, ncol=nla)
 
#——————————-
# iteration over all folds
#——————————-
for (i in 1:nfolds) {
    # training   fold : tr
    # validation fold : va
    
    ifd < which(foldid==i) # i-th fold
    tr.x < x[ifd,]; tr.y < y[ifd]
    va.x < x[ifd,];  va.y < y[ifd]
    
    # estimation using training fold
    fit < glmnet(tr.x, tr.y, family = “gaussian”
                  lambda = v.lambda)
    # prediction on validation fold
    prd < predict(fit, newx = va.x, type = “response”)
        
    # mean squared error for each lambda
    for(c in 1:nla) {
      m.mse[i,c] < mean((prd[,c]va.y)^2)
    }
}
# average mse
v.mse < colMeans(m.mse)
# save manual cross validation output
cv.out < data.frame(lambda = v.lambda, 
    log_lambda = log(v.lambda), mse = v.mse)
    
#——————————-
# lambda.min
#——————————-
no_lambda_min < which.min(cv.out$mse)
cv.out$lambda[no_lambda_min]
 
#——————————-
# lambda.1se
#——————————-
# standard error of mse
v.mse_se < apply(m.mse,2,sd)/sqrt(nfolds)
# se of min lambda
mse_se_la_min < v.mse_se[no_lambda_min]
# lambda.1se
max(cv.out$lambda[
    cv.out$mse < min(cv.out$mse) + mse_se_la_min])
 
#——————————-
# graph for cross validation
#——————————-
x11(); matplot(x = cv.out$log_lambda, 
    y=cbind(cv.out$mse, cv.out$mse+v.mse_se,
                        cv.out$msev.mse_se), 
    lty = “solid”, col = c(“blue”,“red”,“green”),
    type=c(“p”,“l”,“l”), pch = 16, lwd = 3)
    
cs


Running the above R code results in the next two \(\lambda\)s of two approaches (cv.glmnet() and our implementation). Except for the treatment of a mean squared error, calculation of lambda.min and lambda.1se is the same as that of the case of binomial response. Two figures for cross validation are omitted because we have already seen them at the beginning of this blog.


1
2
3
4
5
6
7
8
9
10
11
12
13
> #————————————-
> # from cv.glmnet()
> # cvfit$lambda.min; cvfit$lambda.1se
> #————————————-
lambda.min : 0.08307327
lambda.1se : 0.1451729
 
> #————————————-
> # from our implementation
> #————————————-
lambda.min : 0.08307327
lambda.1se : 0.1451729
 
cs



Concluding Remarks


In this post, we can easily implement R code for a lasso cross validation with continuous dependent variable by a small modification of the binomial response case. \(\blacksquare\)



To leave a comment for the author, please follow the link and comment on their blog: K & L Fintech Modeling.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)