[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is an issue I often face, so I thought it best to write it down. When doing data analysis, we often want to known how many observations there are in each subgroup. These subgroups can be defined by multiple variables. In the code example below, I want to know how many vehicles there are for each (`cyl`, `gear`) combination:

```library(tidyverse)
data(mtcars)
mtcars %>%
group_by(cyl, gear) %>%
summarize(count = n())

# # A tibble: 8 x 3
# # Groups:   cyl 
#     cyl  gear count
#   <dbl> <dbl> <int>
# 1     4     3     1
# 2     4     4     8
# 3     4     5     2
# 4     6     3     2
# 5     6     4     4
# 6     6     5     1
# 7     8     3    12
# 8     8     5     2
```

If you look carefully, you will notice that there are no vehicles with `cyl == 8` and `gear == 4`. In general it’s probably better to include this combination as a row in the tibble, with count as 0. This is especially important in data pipelines where future processes might expect there to be `length(unique(cyl)) * length(unique(gear))` rows in the dataset.

We can achieve this by ungrouping the dataset and applying `tidyr::complete()`. This ensures that every possible (`cyl`, `gear`) combination gets a row.

```mtcars %>%
group_by(cyl, gear) %>%
summarize(count = n()) %>%
ungroup() %>%
complete(cyl, gear)

# # A tibble: 9 x 3
#     cyl  gear count
#   <dbl> <dbl> <int>
# 1     4     3     1
# 2     4     4     8
# 3     4     5     2
# 4     6     3     2
# 5     6     4     4
# 6     6     5     1
# 7     8     3    12
# 8     8     4    NA
# 9     8     5     2
```

For rows that didn’t appear in the original summary table, `complete()` fills up the remaining columns with `NA`. We can specify the value `complete()` should use to fill in these cells with the `fill` option:

```mtcars %>%
group_by(cyl, gear) %>%
summarize(count = n()) %>%
ungroup() %>%
complete(cyl, gear, fill = list(count = 0))

# # A tibble: 9 x 3
#     cyl  gear count
#   <dbl> <dbl> <int>
# 1     4     3     1
# 2     4     4     8
# 3     4     5     2
# 4     6     3     2
# 5     6     4     4
# 6     6     5     1
# 7     8     3    12
# 8     8     4     0
# 9     8     5     2
```

References: