Variable Selection with Elastic Net

September 3, 2017
By

[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

LASSO has been a popular algorithm for the variable selection and extremely effective with high-dimension data. However, it often tends to “over-regularize” a model that might be overly compact and therefore under-predictive.

The Elastic Net addresses the aforementioned “over-regularization” by balancing between LASSO and ridge penalties. In particular, a hyper-parameter, namely Alpha, would be used to regularize the model such that the model would become a LASSO in case of Alpha = 1 and a ridge in case of Alpha = 0. In practice, Alpha can be tuned easily by the cross-validation. Below is a demonstration of Elastic Net with R glmnet package and its comparison with LASSO and ridge models.

```pkgs <- list("glmnet", "doParallel", "foreach", "pROC")
lapply(pkgs, require, character.only = T)
registerDoParallel(cores = 4)

df2 <- df1[df1\$CARDHLDR == 1, ]
set.seed(2017)
n <- nrow(df2)
sample <- sample(seq(n), size = n * 0.5, replace = FALSE)
train <- df2[sample, -1]
test <- df2[-sample, -1]
mdlY <- as.factor(as.matrix(train["DEFAULT"]))
mdlX <- as.matrix(train[setdiff(colnames(df1), c("CARDHLDR", "DEFAULT"))])
newY <- as.factor(as.matrix(test["DEFAULT"]))
newX <- as.matrix(test[setdiff(colnames(df1), c("CARDHLDR", "DEFAULT"))])
```

First of all, we estimates a LASSO model with Alpha = 1. The function cv.glmnet() is used to search for a regularization parameter, namely Lambda, that controls the penalty strength. As shown below, the model only identifies 2 attributes out of total 12.

```# LASSO WITH ALPHA = 1
cv1 <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = 1)
md1 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv1\$lambda.1se, alpha = 1)
coef(md1)
#(Intercept) -1.963030e+00
#AGE          .
#MAJORDRG     .
#MINORDRG     .
#OWNRENT      .
#INCOME      -5.845981e-05
#SELFEMPL     .
#INCPER       .
#EXP_INC      .
#SPENDING     .
#LOGSPEND    -4.015902e-02
roc(newY, as.numeric(predict(md1, newX, type = "response")))
#Area under the curve: 0.636
```

We next estimates a ridge model as below by setting Alpha = 0. Similarly, Lambda is searched by the cross-validation. Since the ridge penalty would only regularize the magnitude of each coefficient, we end up with a “full” model with all model attributes. The model performance is slightly better with 10 more variables, which is a debatable outcome.

```# RIDGE WITH ALPHA = 0
cv2 <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = 0)
md2 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv2\$lambda.1se, alpha = 0)
coef(md2)
#(Intercept) -2.221016e+00
#AGE         -4.184422e-04
#MAJORDRG     6.684849e-03
#MINORDRG     1.006660e-03
#OWNRENT     -9.082750e-03
#INCOME      -6.960253e-06
#SELFEMPL     3.610381e-03
#INCPER      -3.881890e-07
#EXP_INC     -1.416971e-02
#SPENDING    -1.638184e-05
#LOGSPEND    -6.213884e-03
roc(newY, as.numeric(predict(md2, newX, type = "response")))
#Area under the curve: 0.6435
```

At last, we use the Elastic Net by tuning the value of Alpha through a line search with the parallelism. In this particular case, Alpha = 0.3 is chosen through the cross-validation. As shown below, 6 variables are used in the model that even performs better than the ridge model with all 12 attributes.

```# ELASTIC NET WITH 0 < ALPHA < 1
a <- seq(0.1, 0.9, 0.05)
search <- foreach(i = a, .combine = rbind) %dopar% {
cv <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = i)
data.frame(cvm = cv\$cvm[cv\$lambda == cv\$lambda.1se], lambda.1se = cv\$lambda.1se, alpha = i)
}
cv3 <- search[search\$cvm == min(search\$cvm), ]
md3 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv3\$lambda.1se, alpha = cv3\$alpha)
coef(md3)
#(Intercept) -1.434700e+00
#AGE         -8.426525e-04
#MAJORDRG     6.276924e-02
#MINORDRG     .
#OWNRENT     -2.780958e-02
#INCOME      -1.305118e-04
#SELFEMPL     .
#INCPER      -2.085349e-06
#EXP_INC      .
#SPENDING     .
#LOGSPEND    -9.992808e-02
roc(newY, as.numeric(predict(md3, newX, type = "response")))
#Area under the curve: 0.6449
```

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...