Site icon R-bloggers

How do you make a histogram with equally sized dots or squares for each observation, and colour them by another

[This article was first published on https://pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !DOCTYPE html> < charset="utf-8"> < http-equiv="X-UA-Compatible" content="IE=edge"> < name="viewport" content="width=device-width, initial-scale=1.0"> pacha.dev/blog < !-- MathJax Configuration --> < !-- Smart header: libraries detected based on content --> < !-- File: /tmp/tmp.zvfrzEyd7i/index.html -->
  • < !-- DEBUG: Found sourceCode --> < !-- Load custom CSS after any library CSS to ensure proper precedence -->
  • < header class="site-top">

    Mauricio “Pachá” Vargas Sepúlveda

    Blog with notes about R, Shiny, SQL, Python, Linux and C++. This blog is listed on R-Bloggers.

    HOME 🏠
    < !-- categories are printed below this--> < nav class="sidebar-nav">

    Categories

    < header id="title-block-header" class="quarto-title-block default">

    How do you make a histogram with equally sized dots or squares for each observation, and colour them by another variable

    Answering reader’s questions.
    Author

    Mauricio “Pachá” Vargas S.

    Published

    August 29, 2025

    Because of delays with my scholarship payment, if this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee that shall be used to continue my Open Source efforts. If you need an R package or Shiny dashboard for your team, you can email me or inquiry on Fiverr. The full explanation is here: A Personal Message from an Open Source Contributor

    You can send me questions for the blog using this form.

    I got this question from a reader: How do you make a histogram with equally sized dots or squares for each observation, and colour them by another variable?

    I shall use the Palmer’s Penguins dataset to answer this, which contains observation about the species and body mass for a sample of penguins:

    if (!require(palmerpenguins)) install.packages("palmerpenguins")
    if (!require(dplyr)) install.packages("dplyr")
    
    library(palmerpenguins)
    library(dplyr)
    
    glimpse(penguins)
    Rows: 344
    Columns: 8
    $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
    $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
    $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
    $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
    $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
    $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
    $ sex               <fct> male, female, female, NA, female, male, female, male…
    $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

    To use squares, one possibility is to create a discrete body mass variable by intervals and count by species and interval:

    if (!require(tidyr)) install.packages("tidyr")
    Loading required package: tidyr
    if (!require(ggplot2)) install.packages("ggplot2")
    Loading required package: ggplot2
    if (!require(tintin)) install.packages("tintin")
    Loading required package: tintin
    library(tidyr)
    library(ggplot2)
    library(tintin)
    
    # Create quantile-based bins (wider bins)
    n_bins <- 5  # number of quantile bins
    
    d <- penguins %>%
      drop_na(body_mass_g, species) %>%
      mutate(body_mass_d = cut(body_mass_g, breaks = 4, dig.lab = 6)) %>%
      group_by(species, body_mass_d) %>%
      count()
    
    d
    # A tibble: 9 × 3
    # Groups:   species, body_mass_d [9]
      species   body_mass_d       n
      <fct>     <fct>         <int>
    1 Adelie    (2696.4,3600]    71
    2 Adelie    (3600,4500]      73
    3 Adelie    (4500,5400]       7
    4 Chinstrap (2696.4,3600]    26
    5 Chinstrap (3600,4500]      40
    6 Chinstrap (4500,5400]       2
    7 Gentoo    (3600,4500]      17
    8 Gentoo    (4500,5400]      72
    9 Gentoo    (5400,6303.6]    34

    Now I can create a Tetris-style column plot where each square represents 5 penguins:

    square_size <- 5  # square = 5 observations
    
    d_squares <- d %>%
      mutate(
        full_squares = n %/% square_size,          # number of full squares
        remainder = n %% square_size,              # remaining observations
        partial_height = remainder / square_size   # height of partial square
      )
    
    # Create full squares
    full_squares_df <- d_squares %>%
      filter(full_squares > 0) %>%
      uncount(full_squares) %>%
      group_by(species, body_mass_d) %>%
      mutate(square_id = row_number() - 1,
             y = square_id + 0.5,
             height = 1,
             square_type = "full") %>%
      ungroup()
    
    # Create partial squares
    partial_squares_df <- d_squares %>%
      filter(remainder > 0) %>%
      mutate(square_id = full_squares,
             y = full_squares + partial_height/2,
             height = partial_height,
             square_type = "partial")
    
    # Combine both
    d_squares <- bind_rows(full_squares_df, partial_squares_df)
    
    # Create Tetris-style column plot with grouped squares
    ggplot(d_squares, aes(x = body_mass_d, y = y, fill = species)) +
      geom_tile(aes(height = height), width = 0.9, color = "white", linewidth = 0.5) +
      scale_x_discrete(name = "Body mass intervals") +
      scale_y_continuous(name = paste0("Count (each full square = ", square_size, " penguins)"), 
                         expand = expansion(add = 0)) +
      scale_fill_tintin_d(option = "the black island", direction = -1) +
      labs(title = "Column plot with grouped squares") +
      facet_wrap(~species) +
      theme_minimal(base_size = 13) +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))

    I hope this is useful 🙂

    < footer>

    Loading…

  • < !-- Load shared sidebar -->
    To leave a comment for the author, please follow the link and comment on their blog: https://pacha.dev/blog.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Exit mobile version