How to compute the z-score with R

[This article was first published on R Programming – DataScience+, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Category

Tags

Sometimes it is necessary to standardize the data due to its distribution or simply because we need to have a fair comparison of a value (e.g, body weight) with a reference population (e.g., school, city, state, country). The calculation of z-score is simple, but less information we can find on the web for its purpose and mean.

In this post, I will explain what the z-score means, how it is calculated with an example, and how to create a new z-score variable in R. As usual, I will use the data from National Health and Nutrition Examination Survey (NHANES).

What is Z-score

In short, the z-score is a measure that shows how much away (below or above) of the mean is a specific value (individual) in a given dataset. In the example below, I am going to measure the z value of body mass index (BMI) in a dataset from NHANES.

Get the data and packages

Loading packages and creating the dataset:

library(tidyverse)
library(RNHANES)
dat = nhanes_load_data("DEMO_E", "2007-2008") %>%
  select(SEQN, RIAGENDR) %>%
  left_join(nhanes_load_data("BMX_E", "2007-2008"), by="SEQN") %>%
  select(SEQN, RIAGENDR, BMXBMI) %>% 
  filter(RIAGENDR == "1", !is.na(BMXBMI)) %>% 
  transmute(SEQN, Gender = RIAGENDR, BMI = BMXBMI)
dat
    SEQN Gender   BMI
1  41475      2 58.04
2  41476      2 15.18
3  41477      1 30.05
4  41479      1 27.56
5  41480      1 17.93
6  41481      1 23.34
7  41482      1 33.64
8  41483      1 44.06
9  41485      2 25.99
10 41486      2 31.21

How to calculate the z-score for BMI

To calculate the z-score of BMI, we need to have the average of BMI, the standard deviation of BMI.

Mean of BMI:

mean(dat$BMI)
## [1] 25.70571

Standard deviation of BMI:

sd(dat$BMI)
## [1] 7.608628

Suppose we want to calculate the z-score of the first and third participant in the dataset `dat`. The calculation will be: I take the actual BMI (58.04), substract the mean (25.70571), and divide the difference by the standard deviation (7.608628). The result is 4.249687. This indicate that z score is 4.249687 standard deviations above the average of population.

(58.04 - 25.70571)/7.608628 = 4.249687

How to calculate the z-score in R

dat %>% 
  mutate(zscore = (BMI - mean(BMI))/sd(BMI))
     SEQN Gender   BMI       zscore
1   41475      2 58.04  4.249687006
2   41476      2 15.18 -1.383391690
3   41477      1 30.05  0.570968558
4   41479      1 27.56  0.243708503
5   41480      1 17.93 -1.021959902
6   41481      1 23.34 -0.310925004
7   41482      1 33.64  1.042801328
8   41483      1 44.06  2.412299228
9   41485      2 25.99  0.037363810
10  41486      2 31.21  0.723427057

Now we see the z-score for each individual, and the values corresponded to what we calculated above.

If you calculate the mean and standard deviation of the zscore above, you will find that mean is 0, and standard deviation is 1.

Feel free to comment!

Related Post

To leave a comment for the author, please follow the link and comment on their blog: R Programming – DataScience+.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)