Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

LASSO has been a popular algorithm for the variable selection and extremely effective with high-dimension data. However, it often tends to “over-regularize” a model that might be overly compact and therefore under-predictive.

The Elastic Net addresses the aforementioned “over-regularization” by balancing between LASSO and ridge penalties. In particular, a hyper-parameter, namely Alpha, would be used to regularize the model such that the model would become a LASSO in case of Alpha = 1 and a ridge in case of Alpha = 0. In practice, Alpha can be tuned easily by the cross-validation. Below is a demonstration of Elastic Net with R glmnet package and its comparison with LASSO and ridge models.

pkgs <- list("glmnet", "doParallel", "foreach", "pROC")
lapply(pkgs, require, character.only = T)
registerDoParallel(cores = 4)

df2 <- df1[df1$CARDHLDR == 1, ] set.seed(2017) n <- nrow(df2) sample <- sample(seq(n), size = n * 0.5, replace = FALSE) train <- df2[sample, -1] test <- df2[-sample, -1] mdlY <- as.factor(as.matrix(train["DEFAULT"])) mdlX <- as.matrix(train[setdiff(colnames(df1), c("CARDHLDR", "DEFAULT"))]) newY <- as.factor(as.matrix(test["DEFAULT"])) newX <- as.matrix(test[setdiff(colnames(df1), c("CARDHLDR", "DEFAULT"))])  First of all, we estimates a LASSO model with Alpha = 1. The function cv.glmnet() is used to search for a regularization parameter, namely Lambda, that controls the penalty strength. As shown below, the model only identifies 2 attributes out of total 12. # LASSO WITH ALPHA = 1 cv1 <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = 1) md1 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv1$lambda.1se, alpha = 1)
coef(md1)
#(Intercept) -1.963030e+00
#AGE          .
#MAJORDRG     .
#MINORDRG     .
#OWNRENT      .
#INCOME      -5.845981e-05
#SELFEMPL     .
#INCPER       .
#EXP_INC      .
#SPENDING     .
#LOGSPEND    -4.015902e-02
roc(newY, as.numeric(predict(md1, newX, type = "response")))
#Area under the curve: 0.636


We next estimates a ridge model as below by setting Alpha = 0. Similarly, Lambda is searched by the cross-validation. Since the ridge penalty would only regularize the magnitude of each coefficient, we end up with a “full” model with all model attributes. The model performance is slightly better with 10 more variables, which is a debatable outcome.

# RIDGE WITH ALPHA = 0
cv2 <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = 0)
md2 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv2$lambda.1se, alpha = 0) coef(md2) #(Intercept) -2.221016e+00 #AGE -4.184422e-04 #ACADMOS -3.085096e-05 #ADEPCNT 1.485114e-04 #MAJORDRG 6.684849e-03 #MINORDRG 1.006660e-03 #OWNRENT -9.082750e-03 #INCOME -6.960253e-06 #SELFEMPL 3.610381e-03 #INCPER -3.881890e-07 #EXP_INC -1.416971e-02 #SPENDING -1.638184e-05 #LOGSPEND -6.213884e-03 roc(newY, as.numeric(predict(md2, newX, type = "response"))) #Area under the curve: 0.6435  At last, we use the Elastic Net by tuning the value of Alpha through a line search with the parallelism. In this particular case, Alpha = 0.3 is chosen through the cross-validation. As shown below, 6 variables are used in the model that even performs better than the ridge model with all 12 attributes. # ELASTIC NET WITH 0 < ALPHA < 1 a <- seq(0.1, 0.9, 0.05) search <- foreach(i = a, .combine = rbind) %dopar% { cv <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = i) data.frame(cvm = cv$cvm[cv$lambda == cv$lambda.1se], lambda.1se = cv$lambda.1se, alpha = i) } cv3 <- search[search$cvm == min(search$cvm), ] md3 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv3$lambda.1se, alpha = cv3\$alpha)
coef(md3)
#(Intercept) -1.434700e+00
#AGE         -8.426525e-04
#MAJORDRG     6.276924e-02
#MINORDRG     .
#OWNRENT     -2.780958e-02
#INCOME      -1.305118e-04
#SELFEMPL     .
#INCPER      -2.085349e-06
#EXP_INC      .
#SPENDING     .
#LOGSPEND    -9.992808e-02
roc(newY, as.numeric(predict(md3, newX, type = "response")))
#Area under the curve: 0.6449