Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Because of delays with my scholarship payment, if this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee that shall be used to continue my Open Source efforts. If you need an R package or Shiny dashboard for your team, you can email me or inquiry on Fiverr. The full explanation is here: A Personal Message from an Open Source Contributor
You can send me questions for the blog using this form.
I got this question from a reader: How do you make a histogram with equally sized dots or squares for each observation, and colour them by another variable?
I shall use the Palmer’s Penguins dataset to answer this, which contains observation about the species and body mass for a sample of penguins:
library(palmerpenguins) library(dplyr) glimpse(penguins)
Rows: 344 Columns: 8 $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel… $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse… $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, … $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, … $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186… $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, … $ sex <fct> male, female, female, NA, female, male, female, male… $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
To use squares, one possibility is to create a discrete body mass variable by intervals and count by species and interval:
library(tidyr) library(ggplot2) library(tintin) # Create quantile-based bins (wider bins) n_bins <- 5 # number of quantile bins d <- penguins %>% drop_na(body_mass_g, species) %>% mutate(body_mass_d = cut(body_mass_g, breaks = 4, dig.lab = 6)) %>% group_by(species, body_mass_d) %>% count() d
# A tibble: 9 × 3 # Groups: species, body_mass_d [9] species body_mass_d n <fct> <fct> <int> 1 Adelie (2696.4,3600] 71 2 Adelie (3600,4500] 73 3 Adelie (4500,5400] 7 4 Chinstrap (2696.4,3600] 26 5 Chinstrap (3600,4500] 40 6 Chinstrap (4500,5400] 2 7 Gentoo (3600,4500] 17 8 Gentoo (4500,5400] 72 9 Gentoo (5400,6303.6] 34
Now I can create a Tetris-style column plot where each square represents 5 penguins:
square_size <- 5 # square = 5 observations d_squares <- d %>% mutate( full_squares = n %/% square_size, # number of full squares remainder = n %% square_size, # remaining observations partial_height = remainder / square_size # height of partial square ) # Create full squares full_squares_df <- d_squares %>% filter(full_squares > 0) %>% uncount(full_squares) %>% group_by(species, body_mass_d) %>% mutate(square_id = row_number() - 1, y = square_id + 0.5, height = 1, square_type = "full") %>% ungroup() # Create partial squares partial_squares_df <- d_squares %>% filter(remainder > 0) %>% mutate(square_id = full_squares, y = full_squares + partial_height/2, height = partial_height, square_type = "partial") # Combine both d_squares <- bind_rows(full_squares_df, partial_squares_df) # Create Tetris-style column plot with grouped squares ggplot(d_squares, aes(x = body_mass_d, y = y, fill = species)) + geom_tile(aes(height = height), width = 0.9, color = "white", linewidth = 0.5) + scale_x_discrete(name = "Body mass intervals") + scale_y_continuous(name = paste0("Count (each full square = ", square_size, " penguins)"), expand = expansion(add = 0)) + scale_fill_tintin_d(option = "the black island", direction = -1) + labs(title = "Column plot with grouped squares") + facet_wrap(~species) + theme_minimal(base_size = 13) + theme(axis.text.x = element_text(angle = 45, hjust = 1))
I hope this is useful 🙂
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.