# How important is that variable?

**Data Literacy - The blog of Andrés Gutiérrez**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When modeling any phenomena by including explanatory variables that highly relates the variable of interest, one question arises: which of the auxiliary variables have a higher influence on the response? I am not writing about significance testing or something like this. I am just thinking like a researcher who wants to know the ranking of variables that influence the response and their related weight.

There are a variety of methods that try to answer that question. The one inducing this thread is very simple:** isolate units from variables**. Assume a linear model with the following structure (for the sake of simplicity, assume only two explanatory variables):

$$y = \beta_1 x_1 + \beta_2 x_2 + \varepsilon$$

If you assume this model as true and $\beta_i > 0$, then the influence of variable $x_i$, over response $y$ could be found when isolating measure units from variables. Then, one could fit a model over the standardized variables (explanatory and response) and then directly comparing the regression coefficients. Another way to do this is by means of the following expression:

$$I(i) = \frac{\beta_i}{sd(\beta_i)} = \beta_i\frac{ sd(x_i)}{sd(y)}$$

For example, let’s consider the following model $y = -500 x_1 + 50 x_2 + \varepsilon$, then the relative importance of the first and second variable is around 500/(500+50) = 0.9, and 50/(500+50) = 0.1, respectively. The following code shows how to perform this simple analysis in R.

n <- 10000 x1 <- runif(n) x2 <- runif(n) y <- -500 * x1 + 50 * x2 + rnorm(n) model <- lm(y ~ 0 + x1 + x2) # 1a. Standardized betas summary(model)$coe[,2] sd.betas <- summary(model)$coe[,2] betas <- model$coefficients imp <- abs(betas)/sd.betas imp <- imp/sum(imp) imp # 1b. Standardized betas imp1 <- abs(model$coefficients[1] * sd(x1)/sd(y)) imp2 <- abs(model$coefficients[2] * sd(x2)/sd(y)) imp1 / (imp1 + imp2) imp2 / (imp1 + imp2) # 2. Standardized variables model2 <- lm(I(scale(y)) ~ 0 + I(scale(x1)) + I(scale(x2))) summary(model2) abs(model2$coefficients)/sum(abs(model2$coefficients))

**leave a comment**for the author, please follow the link and comment on their blog:

**Data Literacy - The blog of Andrés Gutiérrez**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.