Visit for the most up-to-date information on Data Science, employment, and tutorials finnstats.
If you want to read the original article, go here How to Use the scale() Function in R
Scale() Function in R, Scaling is a technique for comparing data that isn’t measured in the same way. The normalizing of a dataset using the mean value and standard deviation is known as scaling.
When working with vectors or columns in a data frame, scaling is frequently employed.
In R, you can use the scale() function to scale the values in a vector, matrix, or data frame.
You will almost always receive meaningless results if you do not normalize the vectors or columns you are utilizing.
Scale() is a built-in R function that centers and/or scales the columns of a numeric matrix by default.
Only if the value provided is numeric, the scale() function subtracts the values of each column by the matching “center” value from the argument.
The following is the fundamental syntax for this function:
scale(x, center = TRUE, scale = TRUE)
x: Name of the scaled object
center: When scaling, whether the mean should be subtracted. TRUE is the default value.
scale: When scaling, whether to divide by the standard deviation. TRUE is the default value.
This function uses the following formula to calculate scaled values.
xscaled = (x – x̄) / s
x: real x-value
x̄: Sample mean
s: Sample SD
This is also known as data standardization, and it basically involves converting each original value into a z-score.
If the value is numeric, the scale() method divides the values of each column by the corresponding scale value from the input.
Otherwise, the standard deviation or root-mean-square values are used to split the numbers.
The examples below demonstrate how to utilize this function in practice.
Example 1: Scale the Values in a Vector
Assume we have the following value vector in R.
x <- c(11, 12, 13,24, 25, 16, 17, 18, 19)
look at the average and standard deviation of the data
The scale() function is used to scale the values in the vector in the following code.
x values should be scaled
x_scaled <- scale(x)
Let’s view the scaled values
x_scaled [,1] [1,] -1.25850641 [2,] -1.05624645 [3,] -0.85398649 [4,] 1.37087305 [5,] 1.57313301 [6,] -0.24720662 [7,] -0.04494666 [8,] 0.15731330 [9,] 0.35957326 attr(,"scaled:center")  17.22222 attr(,"scaled:scale")  4.944132
If you center the data while scaling a vector, you will receive negative numbers. When comparing vectors, it reduces the effect of a different scale, bringing it closer to a normal distribution.
This type of normalization is useful when comparing proposed data from multiple measures.
It’s worth noting that if we supplied scale=FALSE, the function would not have split by the standard deviation when scaling:
Don’t divide by standard deviation when scaling x values.
x_scaled <- scale(x, scale = FALSE) x_scaled [,1] [1,] -6.2222222 [2,] -5.2222222 [3,] -4.2222222 [4,] 6.7777778 [5,] 7.7777778 [6,] -1.2222222 [7,] -0.2222222 [8,] 0.7777778 [9,] 1.7777778 attr(,"scaled:center")  17.22222
Example 2: Scale the Column Values in a Data Frame
When we want to scale the values in several columns of a data frame so that each column has a mean of 0 and a standard deviation of 1, we usually use the scale() function.
As an example, consider the following data frame in R:
data <- data.frame(x=c(11, 12, 23, 24, 25, 66, 77, 18, 9), y=c(60, 80, 90, 10, 5, 6, 700, 180, 190)) data x y 1 11 60 2 12 80 3 23 90 4 24 10 5 25 5 6 66 6 7 77 700 8 18 180 9 9 190 df_scaled <- scale(data) df_scaled x y 1 11 60 2 12 80 3 23 90 4 24 10 5 25 5 6 66 6 7 77 700 8 18 180 9 9 190
The y variable’s range of values is significantly larger than the x variable’s range of values.
The scale() method can be used to scale the values in both columns so that the scaled values of x and y have the same mean and standard deviation.
The x and y columns now have the same mean of 0 and standard deviation of 1.
With the default settings, the scale() function calculates the vector’s mean and standard deviation, then “scales” each element by removing the mean and dividing by the sd.
When you have several variables to examine over multiple scales, the scale() function makes more sense. One variable, for example, is of magnitude 100, whereas another is of magnitude 1000.
The scale serves no purpose other than to standardize the data. The values it generates are known by a variety of names, one of which being z-scores.
Subscribe to our newsletter!
Don't forget to express your happiness by leaving a comment.
How to Use the scale() Function in R.