Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Tags

Sometimes it is necessary to standardize the data due to its distribution or simply because we need to have a fair comparison of a value (e.g, body weight) with a reference population (e.g., school, city, state, country). The calculation of z-score is simple, but less information we can find on the web for its purpose and mean.

In this post, I will explain what the z-score means, how it is calculated with an example, and how to create a new z-score variable in R. As usual, I will use the data from National Health and Nutrition Examination Survey (NHANES).

### What is Z-score

In short, the z-score is a measure that shows how much away (below or above) of the mean is a specific value (individual) in a given dataset. In the example below, I am going to measure the z value of body mass index (BMI) in a dataset from NHANES.

### Get the data and packages

```library(tidyverse)
library(RNHANES)
select(SEQN, RIAGENDR) %>%
select(SEQN, RIAGENDR, BMXBMI) %>%
filter(RIAGENDR == "1", !is.na(BMXBMI)) %>%
transmute(SEQN, Gender = RIAGENDR, BMI = BMXBMI)
dat
SEQN Gender   BMI
1  41475      2 58.04
2  41476      2 15.18
3  41477      1 30.05
4  41479      1 27.56
5  41480      1 17.93
6  41481      1 23.34
7  41482      1 33.64
8  41483      1 44.06
9  41485      2 25.99
10 41486      2 31.21```

### How to calculate the z-score for BMI

To calculate the z-score of BMI, we need to have the average of BMI, the standard deviation of BMI.

Mean of BMI:

````mean(dat\$BMI)`
##  25.70571
```

Standard deviation of BMI:

````sd(dat\$BMI)`
##  7.608628
```

Suppose we want to calculate the z-score of the first and third participant in the dataset `dat`. The calculation will be: I take the actual BMI (58.04), substract the mean (25.70571), and divide the difference by the standard deviation (7.608628). The result is 4.249687. This indicate that z score is 4.249687 standard deviations above the average of population.

```(58.04 - 25.70571)/7.608628 = 4.249687
```

### How to calculate the z-score in R

```dat %>%
mutate(zscore = (BMI - mean(BMI))/sd(BMI))
SEQN Gender   BMI       zscore
1   41475      2 58.04  4.249687006
2   41476      2 15.18 -1.383391690
3   41477      1 30.05  0.570968558
4   41479      1 27.56  0.243708503
5   41480      1 17.93 -1.021959902
6   41481      1 23.34 -0.310925004
7   41482      1 33.64  1.042801328
8   41483      1 44.06  2.412299228
9   41485      2 25.99  0.037363810
10  41486      2 31.21  0.723427057
```

Now we see the z-score for each individual, and the values corresponded to what we calculated above.

If you calculate the mean and standard deviation of the `zscore` above, you will find that mean is 0, and standard deviation is 1.

Feel free to comment!

Related Post