R Tutorial Series: Centering Variables and Generating Z-Scores with the Scale() Function

March 1, 2012
By

(This article was first published on R Tutorial Series, and kindly contributed to R-bloggers)

Centering variables and creating z-scores are two common data analysis activities. While they are relatively simple to calculate by hand, R makes these operations extremely easy thanks to the scale() function.

Tutorial Files

Before we begin, you may want to download the dataset (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory.

The Scale() Function

The scale() function makes use of the following arguments.

  • x: a numeric object
  • center: if TRUE, the objects’ column means are subtracted from the values in those columns (ignoring NAs); if FALSE, centering is not performed
  • scale: if TRUE, the centered column values are divided by the column’s standard deviation (when center is also TRUE; otherwise, the root mean square is used); if FALSE, scaling is not performed

Centering Variables

Normally, to center a variable, you would subtract the mean of all data points from each individual data point. With scale(), this can be accomplished in one simple call.

  1. > #center variable A using the scale() function
  2. > scale(A, center = TRUE, scale = FALSE)

You can verify these results by making the calculation by hand, as demonstrated in the following screenshot.

Centering a variable with the scale() function and by hand

Generating Z-Scores

Normally, to create z-scores (standardized scores) from a variable, you would subtract the mean of all data points from each individual data point, then divide those points by the standard deviation of all points. Again, this can be accomplished in one call using scale().

  1. > #generate z-scores for variable A using the scale() function
  2. > scale(A, center = TRUE, scale = TRUE)

Again, the following screenshot demonstrates equivalence between the function results and hand calculation.

Generating z-scores from a variable by hand and using the scale() function

Complete Scale() Example

To see a complete example of how scale() can be used to center variables and generate z-scores in R, please download the scale() example (.txt) file.

References

The official scale function manual page is available from: http://stat.ethz.ch/R-manual/R-patched/library/base/html/scale.html

To leave a comment for the author, please follow the link and comment on their blog: R Tutorial Series.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)