**R Programming – DataScience+**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Category

## Tags

Sometimes it is necessary to standardize the data due to its distribution or simply because we need to have a fair comparison of a value (e.g, body weight) with a reference population (e.g., school, city, state, country). The calculation of z-score is simple, but less information we can find on the web for its purpose and mean.

In this post, I will explain what the z-score means, how it is calculated with an example, and how to create a new z-score variable in R. As usual, I will use the data from National Health and Nutrition Examination Survey (NHANES).

### What is Z-score

In short, the z-score is a measure that shows how much away (below or above) of the mean is a specific value (individual) in a given dataset. In the example below, I am going to measure the z value of body mass index (BMI) in a dataset from NHANES.

### Get the data and packages

Loading packages and creating the dataset:

library(tidyverse) library(RNHANES) dat = nhanes_load_data("DEMO_E", "2007-2008") %>% select(SEQN, RIAGENDR) %>% left_join(nhanes_load_data("BMX_E", "2007-2008"), by="SEQN") %>% select(SEQN, RIAGENDR, BMXBMI) %>% filter(RIAGENDR == "1", !is.na(BMXBMI)) %>% transmute(SEQN, Gender = RIAGENDR, BMI = BMXBMI) datSEQN Gender BMI 1 41475 2 58.04 2 41476 2 15.18 3 41477 1 30.05 4 41479 1 27.56 5 41480 1 17.93 6 41481 1 23.34 7 41482 1 33.64 8 41483 1 44.06 9 41485 2 25.99 10 41486 2 31.21

### How to calculate the z-score for BMI

To calculate the z-score of BMI, we need to have the average of BMI, the standard deviation of BMI.

Mean of BMI:

`mean(dat$BMI)`

## [1] 25.70571

Standard deviation of BMI:

`sd(dat$BMI)`

## [1] 7.608628

Suppose we want to calculate the z-score of the first and third participant in the dataset `dat`. The calculation will be: I take the actual BMI (58.04), substract the mean (25.70571), and divide the difference by the standard deviation (7.608628). The result is 4.249687. This indicate that z score is 4.249687 standard deviations above the average of population.

(58.04 - 25.70571)/7.608628 = 4.249687

### How to calculate the z-score in R

dat %>% mutate(zscore = (BMI - mean(BMI))/sd(BMI))SEQN Gender BMI zscore 1 41475 2 58.04 4.249687006 2 41476 2 15.18 -1.383391690 3 41477 1 30.05 0.570968558 4 41479 1 27.56 0.243708503 5 41480 1 17.93 -1.021959902 6 41481 1 23.34 -0.310925004 7 41482 1 33.64 1.042801328 8 41483 1 44.06 2.412299228 9 41485 2 25.99 0.037363810 10 41486 2 31.21 0.723427057

Now we see the z-score for each individual, and the values corresponded to what we calculated above.

If you calculate the mean and standard deviation of the `zscore`

above, you will find that mean is 0, and standard deviation is 1.

Feel free to comment!

Related Post

- Integration in R
- Calculus in R
- Normality Tests in Python
- Computation of algebraic mathematics with SymPy in Python
- Visualize correlation matrices in Python

**leave a comment**for the author, please follow the link and comment on their blog:

**R Programming – DataScience+**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.