Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Standardize Data in R? appeared first on Data Science Tutorials

How to Standardize Data in R?, A dataset must be scaled so that the mean value is 0 and the standard deviation is 1, which is known as standardization.

The z-score standardization, which scales numbers using the following formula, is the most used method for doing this.

Two-Way ANOVA Example in R-Quick Guide – Data Science Tutorials

`(xi – xbar) / s`

where:

xi: The ith value in the dataset

xbar: The sample mean

s: The sample standard deviation

The examples below demonstrate how to scale one or more variables in a data frame using the z-score standardization in R by using the scale() function and the dplyr package.

## Standardize just one variable

In a data frame containing three variables, the following code demonstrates how to scale just one of the variables.

`library(dplyr)`

Now make this example reproducible

`set.seed(123)`

Now let’s create an original data frame

```df <- data.frame(var1= runif(10, 0, 50),
var2= runif(10, 2, 20),
var3= runif(10, 5, 30))```

Now we can view the original data frame

```df
var1      var2      var3
1  14.378876 19.223000 27.238483
2  39.415257 10.160015 22.320085
3  20.448846 14.196271 21.012670
4  44.150870 12.307401 29.856744
5  47.023364  3.852644 21.392645
6   2.277825 18.196849 22.713262
7  26.405274  6.429579 18.601651
8  44.620952  2.757072 19.853551
9  27.571751  7.902573 12.228993
10 22.830737 19.181066  8.677841```

scale var1 to have mean = 0 and standard deviation = 1

```df2 <- df %>% mutate_at(c('var1'), ~(scale(.) %>% as.vector))
df2
var1      var2      var3
1  -0.98619132 19.223000 27.238483
2   0.71268801 10.160015 22.320085
3  -0.57430484 14.196271 21.012670
4   1.03402981 12.307401 29.856744
5   1.22894699  3.852644 21.392645
6  -1.80732540 18.196849 22.713262
7  -0.17012290  6.429579 18.601651
8   1.06592790  2.757072 19.853551
9  -0.09096999  7.902573 12.228993
10 -0.41267825 19.181066  8.677841```

You’ll notice that the other two variables didn’t change; only the first variable was scaled.

The new scaled variable has a mean value of 0, and a standard deviation of 1, as we can immediately confirm.

Bind together two data frames by their rows or columns in R (datasciencetut.com)

compute the scaled variable’s mean.

```mean(df2\$var1)
 2.638406e-17 basically zero```

calculate the scaled variable’s standard deviation.

```sd(df2\$var1)
 1```

## Standardize Multiple Variables

Multiple variables in a data frame can be scaled simultaneously using the code provided below:

scale var1 and var2 to have mean = 0 and standard deviation = 1

```df3 <- df %>% mutate_at(c('var1', 'var2'), ~(scale(.) %>% as.vector))
df3
var1       var2      var3
1  -0.98619132  1.2570692 27.238483
2   0.71268801 -0.2031057 22.320085
3  -0.57430484  0.4471923 21.012670
4   1.03402981  0.1428686 29.856744
5   1.22894699 -1.2193121 21.392645
6  -1.80732540  1.0917418 22.713262
7  -0.17012290 -0.8041315 18.601651
8   1.06592790 -1.3958243 19.853551
9  -0.09096999 -0.5668114 12.228993
10 -0.41267825  1.2503130  8.677841```

## Standardize All Variables

Using the mutate_all function, the following code demonstrates how to scale each variable in a data frame.

scale all variables to have mean = 0 and standard deviation = 1

How to Rank by Group in R? – Data Science Tutorials

```df4 <- df %>% mutate_all(~(scale(.) %>% as.vector))
df4
var1       var2        var3
1  -0.98619132  1.2570692  1.09158171
2   0.71268801 -0.2031057  0.30768348
3  -0.57430484  0.4471923  0.09930665
4   1.03402981  0.1428686  1.50888235
5   1.22894699 -1.2193121  0.15986731
6  -1.80732540  1.0917418  0.37034828
7  -0.17012290 -0.8041315 -0.28496363
8   1.06592790 -1.3958243 -0.08543481
9  -0.09096999 -0.5668114 -1.30064291
10 -0.41267825  1.2503130 -1.86662844```

The post How to Standardize Data in R? appeared first on Data Science Tutorials