A Footnote in History

[This article was first published on Category R on Quantum Jitter, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Producing a journal-quality table requires fine-grained and reproducible control over presentation. Surgical targeting of footnotes, capable of adapting to changes in the underlying data, is one example.

This post briefly explores the shifts in the nature of employment whilst at the same time visiting the grammar of tables gt: The natural companion to the grammar of graphics ggplot2.

library(tidyverse)
library(readxl)
library(gt)
library(kableExtra)

Tables are rendered across all projects on this site; as a minimum at the end to summarise an auto-generated overview of the R packages and functions used. For these tables, kableExtra has been the go-to solution which requires simply piping the data frame into kbl().

In Digging Deep, the DT package is used to produce a reactable table; one with sortable and searchable columns. DT is intended as an R interface to the DataTables library, but reactivity is not yet supported in gt.

As a guiding principle, RStudio – soon to be Posit – packages are my first port of call. This provides a confidence in cross-package consistency, longevity and an investment in development and support. Hence gt will be the go-to package for the static table further down.

As the intent is to present a summary in the style of the Financial Times, we’ll need a suitable custom colour palette.

theme_set(theme_bw())

cols <- c(
  "#FFF1E5", "#F2DFCE",
  "#333333", "#800D33",
  "#C00000", "#00994D"
) |>
  fct_inorder()

tibble(x = 1:6, y = 1) |>
  ggplot(aes(x, y, fill = cols)) +
  geom_col() +
  geom_label(aes(label = cols), 
             nudge_y = -0.1, fill = "white") +
  annotate(
    "label",
    x = 3.5, y = 0.5,
    label = "Financial Times",
    fill = "white",
    alpha = 0.8,
    size = 6
  ) +
  scale_fill_manual(values = as.character(cols)) +
  theme_void() +
  theme(legend.position = "none")

The labour market data are sourced from the Office for National Statistics.

read_data <- function(x) {
  read_xlsx(
    x,
    skip = 12,
    col_names = c(
      "occupation",
      "persons"
    ),
    col_types = c(
      "text",
      "numeric",
      "skip",
      "skip",
      "skip",
      "skip",
      "skip"
    )
  )
} |> 
  mutate(year = x |> str_remove(".xlsx") |> as.integer())

pop_df <- list("2004.xlsx", "2021.xlsx") |> 
  map_dfr(read_data)

There’s a hierarchy to the data, so I’ll extract the lowest level and then slice off the top and bottom occupations based on their percentage change over time.

change_df <- pop_df |> 
  filter(str_starts(occupation, "\\d{4} ")) |> 
  pivot_wider(names_from = year, values_from = persons) |> 
  separate(occupation, into = c("soc", "occupation"), sep = 5) |> 
  mutate(change = `2021` / `2004` - 1) |> 
  arrange(desc(change)) |> 
  mutate(group = if_else(row_number() <= 10, "Risers", "Fallers")) |> 
  slice(c(1:10, (n()-10):n())) |> 
  relocate(group)

The handling of footnotes is a particularly nice feature in gt: The package automatically assigns, and maintains the order of, the superscripted numbers (could also be symbols) to ensure they flow naturally. And targeting offers a high degree of control and reproducibility.

For example, two entries in the table below use the abbreviation n.e.c.. The footnote may be targeted at rows which contain that string rather than having to manually identify the rows. And once added, any subsequent footnotes would be renumbered to maintain the flow. So, if I were to change the source datasets to different years or countries, all references to n.e.c. would be automagically found and appropriately footnoted.

gt_tbl <- change_df |>
  gt(rowname_col = c("occupation"), groupname_col = "group") |>
  tab_header(title = "UK Employment by Occupation") |> 
  fmt_number(
    columns = starts_with("2"),
    decimals = 0
  ) |>
  fmt_percent(
    columns = starts_with("c"),
    decimals = 0,
    force_sign = TRUE
  ) |>
  sub_missing() |>
  tab_spanner(
    label = "Year",
    columns = starts_with("2")
  ) |> 
  tab_style(
    style = cell_text(transform = "capitalize"),
    locations = cells_column_labels(!starts_with("s"))
  ) |> 
  tab_style(
    style = cell_text(transform = "uppercase"),
    locations = cells_column_labels("soc")
  ) |> 
  tab_footnote(
    footnote = "Not elsewhere classified",
    locations = cells_stub(rows = contains("n.e.c."))
  ) |>
  tab_footnote(
    footnote = "Count of all persons",
    locations = cells_column_spanners()
  ) |>
  tab_footnote(
    footnote = "Standard Occupational Classification 2020",
    locations = cells_column_labels(columns = "soc")
  ) |>
  tab_footnote(
    footnote = "Top & bottom 10 occupations ordered by percent change",
    locations = cells_row_groups(groups = c("Risers", "Fallers"))
  ) |>
  tab_footnote(
    footnote = "Figures suppressed as statistically unreliable",
    locations = cells_body(
      columns = c(change, `2021`),
      rows = is.na(change)
    )
  ) |>
  tab_source_note(source_note = "Source: Office for National Statistics (ONS)")

gt_tbl |> 
  opt_stylize(style = 6, color = "gray", add_row_striping = TRUE) |> 
  gtsave("styled.png")

The above table uses one of the in-built style theme options. It looks clean and polished. But sometimes the table to be published needs a high degree of customisation to match, for example, a specific branding. gt offers this as we’ll demonstrate by attempting to replicate the style employed by the market data in the Financial Times.

gt_ft <- gt_tbl |> 
  tab_options(
    table.border.top.color = "#FFF1E5",
    table.border.bottom.color = "#FFF1E5",
    table.background.color = "#FFF1E5",
    table.font.size = 8,
    table.font.color = "#262A33",
    row.striping.include_table_body = TRUE,
    row.striping.include_stub = TRUE,
    row.striping.background_color = "#F2DFCE",
    heading.background.color = "#FFF1E5",
    row_group.background.color = "#FFF1E5"
  ) |> 
  opt_vertical_padding(scale = 1.3) |> 
  tab_header(title = html("UK Employment by Occupation  ", 
                          local_image("logo.png", height = 15))) |> 
  tab_style(
    style = list(
      cell_text(font = "Financier Display", size = px(15), align = "left"),
      cell_borders(sides = "bottom", weight = px(3), color = "#262A33")
      ),
    locations = cells_title()
  ) |>
  tab_style(
    style = cell_text(size = 14),
    locations = cells_row_groups()
  ) |> 
  tab_style(
    style = cell_text(color = "#800D33", weight = "bold"),
    locations = cells_stub()
  ) |> 
  tab_style(
    style = cell_text(weight = "bold"),
    locations = list(cells_column_labels(), 
                     cells_column_spanners(), 
                     cells_row_groups(), 
                     cells_title())
  ) |> 
  tab_style(
    style = cell_borders(style = "hidden"),
    locations = list(cells_body(),
                     cells_row_groups(),
                     cells_stub())
  ) |>
  tab_style(
    style = cell_text(color = "#00994D", weight = "bold"),
    locations = cells_body(
      columns = change,
      rows = change >= 0
    )
  ) |> 
  tab_style(
    style = cell_text(color = "#C00000", weight = "bold"),
    locations = cells_body(
      columns = change,
      rows = change < 0
    )
  ) |> 
  tab_style(
    style = cell_text(color = "grey40", size = px(6)),
    locations = list(cells_footnotes(), cells_source_notes())
  )

gt_ft |> gtsave("ft.png", zoom = 5)

R Toolbox

Summarising below the packages and functions used in this post enables me to separately create a toolbox visualisation summarising the usage of packages and functions across all posts.

Package Function
base as.character[1]; as.integer[1]; c[8]; conflicts[1]; cumsum[1]; function[2]; is.na[1]; list[5]; search[1]; sum[1]
dplyr filter[6]; arrange[3]; desc[3]; group_by[1]; if_else[4]; mutate[7]; n[1]; relocate[1]; row_number[1]; slice[1]; summarise[1]
forcats fct_inorder[1]
ggplot2 aes[2]; annotate[1]; geom_col[1]; geom_label[1]; ggplot[1]; scale_fill_manual[1]; theme[1]; theme_bw[1]; theme_set[1]; theme_void[1]
gt cell_borders[2]; cell_text[9]; cells_body[4]; cells_column_labels[4]; cells_column_spanners[2]; cells_footnotes[1]; cells_row_groups[4]; cells_source_notes[1]; cells_stub[3]; cells_title[2]; fmt_number[1]; fmt_percent[1]; gt[1]; gtsave[2]; html[1]; local_image[1]; opt_stylize[1]; opt_vertical_padding[1]; px[3]; sub_missing[1]; tab_footnote[5]; tab_header[2]; tab_options[1]; tab_source_note[1]; tab_spanner[1]; tab_style[10]
kableExtra kbl[1]
purrr map[1]; map_dfr[1]; map2_dfr[1]; possibly[1]; set_names[1]
readr read_lines[1]
readxl read_xlsx[1]
stringr str_c[5]; str_count[1]; str_detect[2]; str_remove[3]; str_remove_all[1]; str_starts[2]
tibble as_tibble[1]; tibble[3]; enframe[1]
tidyr pivot_wider[1]; separate[1]; unnest[1]

Attribution

Contains public sector information licensed under the Open Government Licence v3.0.

To leave a comment for the author, please follow the link and comment on their blog: Category R on Quantum Jitter.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)