A deep dive into glmnet: standardize
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m writing a series of posts on various function options of the glmnet
function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation.
In this post, we will focus on the standardize option.
For reference, here is the full signature of the glmnet
function:
glmnet(x, y, family=c("gaussian","binomial","poisson","multinomial","cox","mgaussian"), weights, offset=NULL, alpha = 1, nlambda = 100, lambda.min.ratio = ifelse(nobsUnless otherwise stated,
will denote the number of observations,
will denote the number of features, and
fit
will denote the output/result of theglmnet
call. The data matrix is denoted byand the response is denoted by
.
standardize
When
standardize = TRUE
(default), columns of the data matrixx
are standardized, i.e. each column ofx
has mean 0 and standard deviation 1. More specifically, we have that for each,
, and
.
Why might we want to do this? Standardizing our features before model fitting is common practice in statistical learning. This is because if our features are on vastly different scales, the features with larger scales will tend to dominate the action. (One instance where we might not want to standardize our features is if they are already all measured along the same scale, e.g. meters or kilograms.)
Notice that the standardization here is slightly different from that offered by the
scale
function:scale(x, center = TRUE, scale = TRUE)
gives the standardization
, and
.
We verify this with a small data example. Generate data according to the following code:
n <- 100; p <- 5; true_p <- 2 set.seed(950) X <- matrix(rnorm(n * p), nrow = n) beta <- matrix(c(rep(1, true_p), rep(0, p - true_p)), ncol = 1) y <- X %*% beta + 3 * rnorm(n)Create a version of the data matrix which has standardized columns:
X_centered <- apply(X, 2, function(x) x - mean(x)) Xs <- apply(X_centered, 2, function(x) x / sqrt(sum(x^2) / n))Next, we run
glmnet
onXs
andy
with both possible options forstandardize
:library(glmnet) fit <- glmnet(Xs, y, standardize = TRUE) fit2 <- glmnet(Xs, y, standardize = FALSE)We can check that we get the same fit in both cases (modulo numerical precision):
sum(fit$lambda != fit2$lambda) # 0 max(abs(fit$beta - fit2$beta)) # 6.661338e-16The documentation notes that the coefficients returned are on the original scale. Let’s confirm that with our small data set. Run
glmnet
with the original data matrix andstandardize = TRUE
:fit3 <- glmnet(X, y, standardize = TRUE)For each column
, our standardized variables are
, where
and
are the mean and standard deviation of column
respectively. If
and
represent the model coefficients of
fit2
andfit3
respectively, then we should have
i.e. we should have
and
for
. The code below checks that this is indeed the case (modulo numerical precision):
# get column means and SDs X_mean <- colMeans(X) X_sd <- apply(X_centered, 2, function(x) sqrt(sum(x^2) / n)) # check difference for intercepts fit2_int <- coefficients(fit2)[1,] fit3_int <- coefficients(fit3)[1,] temp <- fit2_int - colSums(diag(X_mean / X_sd) %*% fit2$beta) max(abs(temp - fit3_int)) # 1.110223e-16 # check difference for feature coefficients temp <- diag(1 / X_sd) %*% fit2$beta max(abs(temp - fit3$beta)) # 1.110223e-15The discussion above has been for the standardization of
x
. What about standardization fory
? The documentation notes that whenfamily = "gaussian"
,y
is automatically standardized, and the coefficients are unstandardized at the end of the procedure.More concretely, let the mean and standard deviation of
be denoted by
and
respectively. If running
glmnet
on standardizedy
gives interceptand coefficients
, then
glmnet
on unstandardizedy
will give interceptand coefficients
.
Again, this can be verified empirically:
# get mean and SD of y y_mean <- mean(y) y_sd <- sqrt(sum((y - y_mean)^2) / n) # fit model with standardized y fit4 <- glmnet(X, (y - y_mean) / y_sd, standardize = TRUE) # check difference for intercepts fit4_int <- coefficients(fit4)[1,] temp <- fit4_int * y_sd + y_mean max(abs(temp - fit3_int)) # 1.110223e-16 # check difference for feature coefficients max(abs(y_sd * fit4$beta - fit3$beta)) # 8.881784e-16To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Odds & Ends.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.