How to Standardize Data in R?

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Standardize Data in R? appeared first on Data Science Tutorials

How to Standardize Data in R?, A dataset must be scaled so that the mean value is 0 and the standard deviation is 1, which is known as standardization.

The z-score standardization, which scales numbers using the following formula, is the most used method for doing this.

Two-Way ANOVA Example in R-Quick Guide – Data Science Tutorials

(xi – xbar) / s

where:

xi: The ith value in the dataset

xbar: The sample mean

s: The sample standard deviation

The examples below demonstrate how to scale one or more variables in a data frame using the z-score standardization in R by using the scale() function and the dplyr package.

Standardize just one variable

In a data frame containing three variables, the following code demonstrates how to scale just one of the variables.

library(dplyr)

Now make this example reproducible

set.seed(123)

Now let’s create an original data frame

df <- data.frame(var1= runif(10, 0, 50),
                 var2= runif(10, 2, 20),
                 var3= runif(10, 5, 30))

Now we can view the original data frame

df
        var1      var2      var3
1  14.378876 19.223000 27.238483
2  39.415257 10.160015 22.320085
3  20.448846 14.196271 21.012670
4  44.150870 12.307401 29.856744
5  47.023364  3.852644 21.392645
6   2.277825 18.196849 22.713262
7  26.405274  6.429579 18.601651
8  44.620952  2.757072 19.853551
9  27.571751  7.902573 12.228993
10 22.830737 19.181066  8.677841

scale var1 to have mean = 0 and standard deviation = 1

df2 <- df %>% mutate_at(c('var1'), ~(scale(.) %>% as.vector))
df2
         var1      var2      var3
1  -0.98619132 19.223000 27.238483
2   0.71268801 10.160015 22.320085
3  -0.57430484 14.196271 21.012670
4   1.03402981 12.307401 29.856744
5   1.22894699  3.852644 21.392645
6  -1.80732540 18.196849 22.713262
7  -0.17012290  6.429579 18.601651
8   1.06592790  2.757072 19.853551
9  -0.09096999  7.902573 12.228993
10 -0.41267825 19.181066  8.677841

You’ll notice that the other two variables didn’t change; only the first variable was scaled.

The new scaled variable has a mean value of 0, and a standard deviation of 1, as we can immediately confirm.

Bind together two data frames by their rows or columns in R (datasciencetut.com)

compute the scaled variable’s mean.

mean(df2$var1)
[1] 2.638406e-17 basically zero

calculate the scaled variable’s standard deviation.

sd(df2$var1)
[1] 1

Standardize Multiple Variables

Multiple variables in a data frame can be scaled simultaneously using the code provided below:

scale var1 and var2 to have mean = 0 and standard deviation = 1

df3 <- df %>% mutate_at(c('var1', 'var2'), ~(scale(.) %>% as.vector))
df3
       var1       var2      var3
1  -0.98619132  1.2570692 27.238483
2   0.71268801 -0.2031057 22.320085
3  -0.57430484  0.4471923 21.012670
4   1.03402981  0.1428686 29.856744
5   1.22894699 -1.2193121 21.392645
6  -1.80732540  1.0917418 22.713262
7  -0.17012290 -0.8041315 18.601651
8   1.06592790 -1.3958243 19.853551
9  -0.09096999 -0.5668114 12.228993
10 -0.41267825  1.2503130  8.677841

Standardize All Variables

Using the mutate_all function, the following code demonstrates how to scale each variable in a data frame.

scale all variables to have mean = 0 and standard deviation = 1

How to Rank by Group in R? – Data Science Tutorials

df4 <- df %>% mutate_all(~(scale(.) %>% as.vector))
df4
        var1       var2        var3
1  -0.98619132  1.2570692  1.09158171
2   0.71268801 -0.2031057  0.30768348
3  -0.57430484  0.4471923  0.09930665
4   1.03402981  0.1428686  1.50888235
5   1.22894699 -1.2193121  0.15986731
6  -1.80732540  1.0917418  0.37034828
7  -0.17012290 -0.8041315 -0.28496363
8   1.06592790 -1.3958243 -0.08543481
9  -0.09096999 -0.5668114 -1.30064291
10 -0.41267825  1.2503130 -1.86662844

The post How to Standardize Data in R? appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)