I’m excited to announce that my first package has been accepted to CRAN! The package
pcLasso implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will give a brief overview of the method and some starter code. (For an in-depth description and elaboration of the method, please see our arXiv preprint. For more details on how to use the package, please see the package’s vignette.)
Let’s say we are in the standard supervised learning setting, with design matrix and response . Let the singular value decomposition (SVD) of be , and let the diagonal entries of be . Principal components lasso solves the optimization problem
where and are non-negative hyperparameters, and is the diagonal matrix with entries . The predictions this model gives for new data are .
This optimization problem seems a little complicated so let me try to motivate it. Notice that if we replace with the identity matrix, since is orthogonal the optimization problem reduces to
which we recognize as the optimization problem that elastic net solves. So we are doing something similar to elastic net.
To be more specific: we can think of as the coordinates of the coefficient vector in the standard basis . Then would be the coordinates of this same coefficient vector, but where the basis comprises the principal component (PC) directions of the design matrix . Since we have the matrix , with entries increasing down its diagonal, instead of the identity matrix sandwiched between and in the quadratic penalty, it means that we are doing shrinkage in the principal components space in a way that (i) leaves the component along the first PC direction unchanged, and (ii) shrinks components along larger PC directions more to 0.
This method extends easily to groups (whether overlapping or non-overlapping). Assume that our features come in groups. For each , let represent the reduced design matrix corresponding to group , and let its SVD be . Let the diagonal entries of be , and let be the diagonal matrix with diagonal entries . Let be the reduced coefficient vector corresponding to the features in group . Then pcLasso solves the optimization problem
Now for some basic code. Let’s make some fake data:
set.seed(1) n <- 100; p <- 10 X <- matrix(rnorm(n * p), nrow = n) y <- rnorm(n)
glmnet in the
glmnet package, the
pcLasso function fits the model for a sequence of values which do not have to be user-specified. The user however, does have to specify the parameter:
library(pcLasso) fit <- pcLasso(X, y, theta = 10)
We can use the generic
predict function to obtain predictions this fit makes on new data. For example, the following code extracts the predictions that pcLasso makes on the 5th value for the first 3 rows of our training data:
predict(fit, X[1:3, ])[, 5] #  0.002523773 0.004959471 -0.014095065
The code above assumes that all our features belong to one big group. If our features come in groups, pcLasso can take advantage of that by specifying the
groups should be a list of length , with
groups[[k]] being a vector of column indices which belong to group . For example, if features 1-5 belong to one group and features 6-10 belong to another group:
> groups <- list(1:5, 6:10) > groups # [] #  1 2 3 4 5 # # [] #  6 7 8 9 10 fit <- pcLasso(X, y, theta = 10, groups = groups)
cv.pcLasso fits pcLasso and picks optimal values via cross-validation. The output of the
cv.pcLasso function can also be used to predict on new data:
fit <- cv.pcLasso(X, y, theta = 10) predict(fit, X[1:3, ], s = "lambda.min") #  -0.01031697 -0.01031697 -0.01031697
The vignette contains significantly more detail on how to use this package. If you spot bugs, have questions, or have features that you would like to see implemented, get in touch with us!