**Memo's Island**, and kindly contributed to R-bloggers)

**Summary**

The common case in data science or machine learning applications, different features or predictors manifest them in different scales. This could bring difficulty in interpreting the resulting coefficients of linear regression, such as one feature having very large or small values compare to other predictors and being in different units first of all. The common approach to overcome this to use z-score‘s for each features, centring and scaling.

This approach would allow us to interpret the effects. A possible question however is how could we map regression coefficients obtained with the scaled data back to original coefficients. In the context of ridge regression, this question is posed by *Mark Seeto* in the R mailing list and provided a solution for two predictor case with an R code. In this post we formalize his approach. Note that, in the case of how to scale, Professor Gelman suggests diving them by two standard deviations. In this post we won’t cover that approach and use usual approach.

**Algebraic Solution: No error term**

An arbitrary linear regression for $n$ variable reads as follows

$$y=(Sigma_{i=1}^n beta_{i} x_{i}) + beta_{0}$$

here, $y$ is being response variable, $x_{i}$ are the predictors, $n=1,..,n$. Let’s use primes for the scaled regression equation for $n$ variable.

$$y’=(Sigma_{i=1}^n beta_{i}’ x_{i}’) + beta_{0}’$$

We would like to express $beta_{i}$ by only using $beta_{i}’$ and two statistic from the data, namely mean and standard deviations, $mu_{x_{i}}$, $mu_{y}$, $sigma_{x_{i}}$ and $sigma_{y}$.

The following transformation can be shown by using the z-scores and some algebra,

$$beta_{0}=beta_{0}’ sigma_{y} + mu_{y} – Sigma_{i=1}^{n} frac{sigma_{y}}{sigma_{x_{i}}}beta_{i}’ mu_{x_{i}}$$

$$beta_{i} = beta_{i}’ frac{sigma_{y}}{sigma_{x_{i}}}$$

**Ridge regression in R**

There are many packages and tools in R to perform ridge regression. One of the prominent one is glmnet. Following Mark Seeto’s example, here we extent that in to many variate case with a helper function scale.back.lm from R1magic package. Function provides a transform utility for $n$-variate case. Here we demo this using 6 predictors, also available as gist,

rm(list=ls())

library(glmnet)

library(R1magic) # https://github.com/msuzen/R1magic

set.seed(4242)

n <- 100 # observations

X <- model.matrix(~., data.frame(x1 = rnorm(n, 1, 1),

x2 = rnorm(n, 2, 2),

x3 = rnorm(n, 3,2),

x4 = rnorm(n, 4,2),

x5 = rnorm(n, 5,1),

x6 = rnorm(n, 6,1)

))[,-1] # glmnet adds the intercept

Y <- matrix(rnorm(n, 1, 2),n,1)

# Now apply scaling

X.s <- scale(X)

Y.s <- scale(Y)

# Ridge regression & coefficients with scaled data

glm.fit.s <- glmnet(X.s, Y.s, alpha=0)

betas.scaled <- as.matrix(as.vector(coef(glm.fit.s)[,80]), 1, 7)

# trasform the coefficients

betas.transformed <- scale.back.lm(X, Y, betas.scaled)

# Now verify the correctness of scaled coefficients:

# ridge regression & coefficients

glm.fit <- glmnet(X, Y, alpha=0)

betas.fit <- as.matrix(as.vector(coef(glm.fit)[,80]), 1, 7)

# Verify correctness: Difference is smaller than 1e-12

sum(betas.fit-betas.transformed) < 1e-12 # TRUE

**Conclusion**

Multiple regression is used by many practitioners. In this post we have shown how to scale continuous predictors and transform back the regression coefficients to original scale. Scaled coefficients would help us to better interpret the results. The question of when to standardize the data is a different issue.

**leave a comment**for the author, please follow the link and comment on their blog:

**Memo's Island**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...