How do you make a histogram with equally sized dots or squares for each observation, and colour them by another variable

[This article was first published on pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Because of delays with my scholarship payment, if this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee that shall be used to continue my Open Source efforts. If you need an R package or Shiny dashboard for your team, you can email me or inquiry on Fiverr. The full explanation is here: A Personal Message from an Open Source Contributor

You can send me questions for the blog using this form.

I got this question from a reader: How do you make a histogram with equally sized dots or squares for each observation, and colour them by another variable?

I shall use the Palmer’s Penguins dataset to answer this, which contains observation about the species and body mass for a sample of penguins:

library(palmerpenguins)
library(dplyr)

glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

To use squares, one possibility is to create a discrete body mass variable by intervals and count by species and interval:

library(tidyr)
library(ggplot2)
library(tintin)

# Create quantile-based bins (wider bins)
n_bins <- 5  # number of quantile bins

d <- penguins %>%
  drop_na(body_mass_g, species) %>%
  mutate(body_mass_d = cut(body_mass_g, breaks = 4, dig.lab = 6)) %>%
  group_by(species, body_mass_d) %>%
  count()

d
# A tibble: 9 × 3
# Groups:   species, body_mass_d [9]
  species   body_mass_d       n
  <fct>     <fct>         <int>
1 Adelie    (2696.4,3600]    71
2 Adelie    (3600,4500]      73
3 Adelie    (4500,5400]       7
4 Chinstrap (2696.4,3600]    26
5 Chinstrap (3600,4500]      40
6 Chinstrap (4500,5400]       2
7 Gentoo    (3600,4500]      17
8 Gentoo    (4500,5400]      72
9 Gentoo    (5400,6303.6]    34

Now I can create a Tetris-style column plot where each square represents 5 penguins:

square_size <- 5  # square = 5 observations

d_squares <- d %>%
  mutate(
    full_squares = n %/% square_size,          # number of full squares
    remainder = n %% square_size,              # remaining observations
    partial_height = remainder / square_size   # height of partial square
  )

# Create full squares
full_squares_df <- d_squares %>%
  filter(full_squares > 0) %>%
  uncount(full_squares) %>%
  group_by(species, body_mass_d) %>%
  mutate(square_id = row_number() - 1,
         y = square_id + 0.5,
         height = 1,
         square_type = "full") %>%
  ungroup()

# Create partial squares
partial_squares_df <- d_squares %>%
  filter(remainder > 0) %>%
  mutate(square_id = full_squares,
         y = full_squares + partial_height/2,
         height = partial_height,
         square_type = "partial")

# Combine both
d_squares <- bind_rows(full_squares_df, partial_squares_df)

# Create Tetris-style column plot with grouped squares
ggplot(d_squares, aes(x = body_mass_d, y = y, fill = species)) +
  geom_tile(aes(height = height), width = 0.9, color = "white", linewidth = 0.5) +
  scale_x_discrete(name = "Body mass intervals") +
  scale_y_continuous(name = paste0("Count (each full square = ", square_size, " penguins)"), 
                     expand = expansion(add = 0)) +
  scale_fill_tintin_d(option = "the black island", direction = -1) +
  labs(title = "Column plot with grouped squares") +
  facet_wrap(~species) +
  theme_minimal(base_size = 13) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

I hope this is useful 🙂

To leave a comment for the author, please follow the link and comment on their blog: pacha.dev/blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)