Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post shows how to use the R packages for estimating an exclusive lasso and a group lasso. These lasso variants have a given grouping order in common but differ in how this grouping constraint is functioning when a variable selection is performed.

# Lasso, Group Lasso, and Exclusive Lasso

While LASSO (least absolute shrinkage and selection operator) has many variants and extensions, our focus is on two lasso models: Group Lasso and Exclusive Lasso. Before we dive into the specifics, let’s go over the similarities and differences of these two lasso variants from the following figure.
In the above figure, 15 variables are categorized into 5 groups. Lasso selects important features irrespective of the grouping. Of course, lasso did not select Group 2’s variables but it is not intended but just an estimation result. While group lasso selects all or none in specific group, exclusive lasso selects at least one variable in each group.

From a perspective of competition, group lasso implements a completion across groups and on the contrary, exclusive lasso makes variables in the same group compete with each other within each group.

Since we can grasp the main characteristics of two lasso modes from the above figure, let’s turn to the mathematical expressions.

### Equations

There are some various expressions for these models and the next equations are for lasso, group lasso, and exclusive Lasso following Qiu et al. (2021).

\begin{align} \text{Lasso} &: \min \left\Vert y-X\beta \right\Vert^2 + \lambda \sum_{j=1}^{m} |\beta_j| \\ \text{Group Lasso} &: \min \left\Vert y-X\beta \right\Vert^2 + \lambda \sum_{g=1}^{G} \Vert \beta_g \Vert_2^1 \\ \text{Exclusive Lasso} &: \min \left\Vert y-X\beta \right\Vert^2 + \lambda \sum_{g=1}^{G} \Vert \beta_g \Vert_1^2 \end{align}
where the coefficient in $$\beta$$ are divided into $$G$$ groups and $$\beta_g$$ denotes the coefficient vector of the $$g$$-th group.

In the group lasso, $$l_{2,1}$$-norm consists of the intra-group non-sparsity via $$l_2$$-norm and inter-group sparsity via $$l_1$$-norm. Therefore, variables of each group will be either selected or discarded entirely. Refer to Yuan and Lin (2006) for more information on the group lasso.

In exclusive lasso, $$l_{1,2}$$-norm consists of the intra-group sparsity via $$l_1$$-norm and inter-group non-sparsity via $$l_2$$-norm. Exclusive lasso selects at least one variable from each group. Refer to Zhou et al. (2010) for more information on the exclusive lasso.

### R code

The following R code implements lasso, group lasso, and exclusive lasso for an artificial data set with a given group index. Required R packages are glmnet for lasso, gglasso for group lasso, and ExclusiveLasso for exclusive lasso.

 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121 #========================================================## Quantitative ALM, Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee ## https://kiandlee.blogspot.com#——————————————————–## Group Lasso and Exclusive Lasso#========================================================# library(glmnet)library(gglasso)library(ExclusiveLasso) graphics.off()  # clear all graphsrm(list = ls()) # remove all files from your workspace set.seed(1234) #——————————————–# X and y variable#——————————————– N = 500 # number of observationsp = 20  # number of variables # random generated XX = matrix(rnorm(N*p), ncol=p) # standardization : mean = 0, std=1X = scale(X) # artificial coefficientsbeta = c(0.15,–0.33,0.25,–0.25,0.05,0,0,0,0.5,0.2,        –0.25, 0.12,–0.125,0,0,0,0,0,0,0) # Y variable, standardized Yy = X%*%beta + rnorm(N, sd=0.5)#y = scale(y) # group index for X variablesv.group <– c(1,1,1,1,1,2,2,2,2,2,             3,3,3,3,3,4,4,4,4,4) #——————————————–# Model with a given lambda#——————————————– # lassola <– glmnet(X, y, lambda = 0.1,             family=“gaussian”, alpha=1,             intercept = F) # group lassogr <– gglasso(X, y, lambda = 0.2,             group = v.group, loss=“ls”,             intercept = F)# exclusive lassoex <– exclusive_lasso(X, y,lambda = 0.2,              groups = v.group, family=“gaussian”,              intercept = F) # Resultsdf.comp <– data.frame(    group = v.group, beta = beta,    Lasso     = la$beta[,1], Group = gr$beta[,1],    Exclusive = ex$coef[,1])df.comp #————————————————# Run cross-validation & select lambda#————————————————# lambda.min : minimal MSE# lambda.1se : the largest λ at which the MSE is # within one standard error of the minimal MSE. # lassola_cv <– cv.glmnet(x=X, y=y, family=‘gaussian’, alpha=1, intercept = F, nfolds=5)x11(); plot(la_cv)paste(la_cv$lambda.min, la_cv$lambda.1se) # group lassogr_cv <– cv.gglasso(x=X, y=y, group=v.group, loss=“ls”, pred.loss=“L2”, intercept = F, nfolds=5)x11(); plot(gr_cv)paste(gr_cv$lambda.min, gr_cv$lambda.1se) # exclusive lassoex_cv <– cv.exclusive_lasso( X, y, groups = v.group, intercept = F, nfolds=5)x11(); plot(ex_cv)paste(ex_cv$lambda.min, ex_cv$lambda.1se) #——————————————–# Model with selected lambda#——————————————– # lassola <– glmnet(X, y, lambda = la_cv$lambda.1se,             family=“gaussian”, alpha=1,             intercept = F) # group lassogr <– gglasso(X, y, lambda = gr_cv$lambda.1se+0.1, group = v.group, loss=“ls”, intercept = F)# exclusive lassoex <– exclusive_lasso(X, y,lambda = ex_cv$lambda.1se,              groups = v.group, family=“gaussian”,              intercept = F) # Resultsdf.comp.lambda.1se <– data.frame(    group = v.group, beta = beta,    Lasso     = la$beta[,1], Group = gr$beta[,1],    Exclusive = ex\$coef[,1])df.comp.lambda.1se

The first output from the above R code is the table of coefficients of all models with given each initial $$\lambda$$ parameter. We can easily find the model-specific pattern of each model.

 1234567891011121314151617181920212223 > df.comp    group   beta       Lasso        Group    ExclusiveV1      1  0.150  0.01931728  0.016938769  0.013555753V2      1 -0.330 -0.18832916 -0.047695924 -0.184065967V3      1  0.250  0.17261562  0.042254702  0.169516525V4      1 -0.250 -0.16322025 -0.043994211 -0.153730137V5      1  0.050  0.00000000  0.009673207  0.000000000V6      2  0.000  0.00000000  0.001067915  0.000000000V7      2  0.000  0.00000000  0.001355834  0.000000000V8      2  0.000  0.00000000  0.014211932  0.000000000V9      2  0.500  0.38757370  0.101900169  0.385382905V10     2  0.200  0.11146785  0.044591933  0.110731304V11     3 -0.250 -0.15010738  0.000000000 -0.186626541V12     3  0.120  0.00000000  0.000000000  0.003117881V13     3 -0.125 -0.08305582  0.000000000 -0.120458426V14     3  0.000  0.00000000  0.000000000  0.000000000V15     3  0.000  0.00000000  0.000000000  0.000000000V16     4  0.000  0.00000000  0.000000000  0.000000000V17     4  0.000  0.00000000  0.000000000  0.000000000V18     4  0.000  0.00000000  0.000000000  0.010918904V19     4  0.000  0.00000000  0.000000000  0.015330520V20     4  0.000  0.00000000  0.000000000  0.013591628

The second output is the table of coefficients of all models with each selected lambda which is a result of cross validation.

 1234567891011121314151617181920212223 > df.comp.lambda.1se    group   beta       Lasso         Group   ExclusiveV1      1  0.150  0.07776181  4.779605e-02  0.08297141V2      1 -0.330 -0.24670209 -1.257863e-01 -0.25235238V3      1  0.250  0.22825029  1.130749e-01  0.23282822V4      1 -0.250 -0.21384666 -1.168170e-01 -0.21582154V5      1  0.050  0.03733144  2.717197e-02  0.04139150V6      2  0.000  0.00000000  2.184575e-03  0.00000000V7      2  0.000  0.00000000  3.353260e-03  0.00000000V8      2  0.000  0.01031027  2.950791e-02  0.02043597V9      2  0.500  0.43538564  2.200230e-01  0.44419164V10     2  0.200  0.16649806  9.620757e-02  0.17844727V11     3 -0.250 -0.20316169 -1.886308e-02 -0.22419384V12     3  0.120  0.03113405  4.430739e-03  0.05392923V13     3 -0.125 -0.13474237 -1.468172e-02 -0.15506193V14     3  0.000  0.00000000 -3.646683e-05  0.00000000V15     3  0.000  0.00000000 -1.311539e-03  0.00000000V16     4  0.000  0.00000000  0.000000e+00  0.00000000V17     4  0.000  0.00000000  0.000000e+00 -0.01087451V18     4  0.000  0.00000000  0.000000e+00  0.00000000V19     4  0.000  0.00000000  0.000000e+00  0.01946948V20     4  0.000  0.00000000  0.000000e+00  0.01318269

### An Interesting Property of Exclusive Lasso

As stated earlier, the exclusive lasso selects at least one variable from each group. Let’s check if this argument holds true with the next R code by setting $$\lambda$$ to a higher value (100), which prevents from selecting variables.

 12345678910111213 # lassola <– glmnet(X, y, lambda = 100,             family=“gaussian”, alpha=1,             intercept = F) # group lassogr <– gglasso(X, y, lambda = 100,             group = v.group, loss=“ls”,             intercept = F)# exclusive lassoex <– exclusive_lasso(X, y,lambda = 100,              groups = v.group, family=“gaussian”,              intercept = F)

The following result is sufficient for supporting the above explanation. While lasso and group lasso discard all variables with a higher $$\lambda$$, exclusive lasso select one variable from each group. I add horizontal dotted lines for separating each group just for exposition purpose.

 1234567891011121314151617181920212223242526 > df.comp.higher.lambda    group   beta Lasso Group     ExclusiveV1      1  0.150     0     0  0.0000000000V2      1 -0.330     0     0 -0.0031930017V3      1  0.250     0     0  0.0000000000V4      1 -0.250     0     0  0.0000000000V5      1  0.050     0     0  0.0000000000——————————————-V6      2  0.000     0     0  0.0000000000V7      2  0.000     0     0  0.0000000000V8      2  0.000     0     0  0.0000000000V9      2  0.500     0     0  0.0051059151V10     2  0.200     0     0  0.0000000000——————————————-V11     3 -0.250     0     0 -0.0026019669V12     3  0.120     0     0  0.0000000000V13     3 -0.125     0     0  0.0000000000V14     3  0.000     0     0  0.0000000000V15     3  0.000     0     0  0.0000000000——————————————-V16     4  0.000     0     0  0.0000000000V17     4  0.000     0     0  0.0000000000V18     4  0.000     0     0  0.0006854416V19     4  0.000     0     0  0.0000000000V20     4  0.000     0     0  0.0000000000

This is interesting and may be useful when we want to select one security in each sector when forming a diversified asset portfolio with many investment sectors. Of course, a further analysis is necessary to select arbitrary predetermined number of securities from each sector.

### Concluding Remarks

This post shows how to use graoup lasso and exclusive lasso using R code. In particular, I think that the exclusive lasso delivers some interesting result which will be investigated furthermore in following research such as sector-based asset allocation (sectoral diversification).

### Reference

Yuan, M. and L. Lin (2006), Model Selection and Estimation in Regression with Grouped Variables, Journal of the Royal Statistical Society, Series B 68, pp. 49–67.

Zhou, Y., R. Jin, and S. Hoi (2010), Exclusive Lasso for Multi-task Feature Selection. In International Conference on Artificial Intelligence and Statistics, pp. 988-995.

Qiu, L., Y. Qu, C. Shang, L. Yang, F. Chao, and Q. Shen (2021), Exclusive Lasso-Based k-Nearest Neighbors Classification. Neural Computing and Applications, pp. 1-15.
$$\blacksquare$$