# tidyr::complete to show all possible combinations of variables

**R – Statistical Odds & Ends**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is an issue I often face, so I thought it best to write it down. When doing data analysis, we often want to known how many observations there are in each subgroup. These subgroups can be defined by multiple variables. In the code example below, I want to know how many vehicles there are for each (`cyl`

, `gear`

) combination:

library(tidyverse) data(mtcars) mtcars %>% group_by(cyl, gear) %>% summarize(count = n()) # # A tibble: 8 x 3 # # Groups: cyl [3] # cyl gear count # <dbl> <dbl> <int> # 1 4 3 1 # 2 4 4 8 # 3 4 5 2 # 4 6 3 2 # 5 6 4 4 # 6 6 5 1 # 7 8 3 12 # 8 8 5 2

If you look carefully, you will notice that there are no vehicles with `cyl == 8`

and `gear == 4`

. In general it’s probably better to include this combination as a row in the tibble, with count as 0. This is especially important in data pipelines where future processes might expect there to be `length(unique(cyl)) * length(unique(gear))`

rows in the dataset.

We can achieve this by ungrouping the dataset and applying `tidyr::complete()`

. This ensures that every possible (`cyl`

, `gear`

) combination gets a row.

mtcars %>% group_by(cyl, gear) %>% summarize(count = n()) %>% ungroup() %>% complete(cyl, gear) # # A tibble: 9 x 3 # cyl gear count # <dbl> <dbl> <int> # 1 4 3 1 # 2 4 4 8 # 3 4 5 2 # 4 6 3 2 # 5 6 4 4 # 6 6 5 1 # 7 8 3 12 # 8 8 4 NA # 9 8 5 2

For rows that didn’t appear in the original summary table, `complete()`

fills up the remaining columns with `NA`

. We can specify the value `complete()`

should use to fill in these cells with the `fill`

option:

mtcars %>% group_by(cyl, gear) %>% summarize(count = n()) %>% ungroup() %>% complete(cyl, gear, fill = list(count = 0)) # # A tibble: 9 x 3 # cyl gear count # <dbl> <dbl> <int> # 1 4 3 1 # 2 4 4 8 # 3 4 5 2 # 4 6 3 2 # 5 6 4 4 # 6 6 5 1 # 7 8 3 12 # 8 8 4 0 # 9 8 5 2

References:

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Statistical Odds & Ends**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.