Bird’s Eye View: using R to generate inventory maps for lab reagents

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is a rather niche post, but the method can likely be adapted for other use cases.

In the lab we have many different cell lines stored in liquid nitrogen. The arrangement is:

  • the vials are in specific positions in a box (10 x 10)
  • there are 13 boxes to a cane
  • we have 5 canes

Ideally, to retrieve the correct vial from the cell store requires a map.

We have a FileMaker Pro database of all of our cell lines which indicates their location and positions in the liquid nitrogen tank. We can view the positions in a box of one cell line at a time, but there isn’t a way (that I could find) to generate a map of each box, which can contain multiple cell lines.

The problem

We need to generate an inventory map for each box in our cell store. Can we do this in R?

The solution

It is possible to export the database contents from FileMaker Pro in a format that R can read. For some reason the csv, tsv and xml options generated files that were difficult to parse so I went with xlsx, which can be read into R using readxl::read_excel()

The next challenge was to make sense of the data and generate data frames that corresponded to each box. Once that was done, it was a case of generating the maps using {ggplot2}. See the code below for a walkthrough.

Example map made in R

The final step was to collate the outputs into a single PDF. If you are a wizard at R markdown, you will hate my “outside of R” solution. A one-liner using gs in the terminal gave me a printable inventory of all our cell lines!

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf cane*.pdf

The code

library(ggplot2)
library(stringr)

## function ----

# this function makes a dataframe of 100 rows, each with a position and a label - it represents a box with 10 rows and 10 columns
boxmaker <- function(df) {
  box <- data.frame(pos = rep(1:100),
                    label = "")
  # the column 'Box Position' in df has a string which is numbers separated either by "\r" or " " or "," they may also be a range, e.g. "2-9"
  # for each row, make this string into a vector of numbers and keep the 'Name' value tied with each
  for(i in 1:nrow(df)) {
    box_positions <- unlist(strsplit(as.character(df$`Box Position`[i]), "[\r, ]"))
    
    # for each of these positions, set the label in the box data frame to the string in the 'Name' column of df
    for (j in 1:length(box_positions)) {
      if (grepl("-", box_positions[j])) {
        range <- as.numeric(unlist(strsplit(box_positions[j], "-")))
        box$label[range[1]:range[2]] <- df$Name[i]
      } else {
        box$label[as.numeric(box_positions[j])] <- df$Name[i]
      }
    }
  }
  return(box)
}

## script ----

# read the data in
df <- readxl::read_excel("Data/untitled.xlsx")
# remove any rows with NA in Cane, Box or Box Position
df <- df[!is.na(df$Cane) & !is.na(df$Box) & !is.na(df$`Box Position`), ]
# also remove any where box number is greater than 13
df <- df[df$Box <= 13, ]


# determine how many Cane types there are and get a list of them
cane_types <- unique(df$Cane)

# for each Cane, determine which Box numbers we have
box_numbers <- lapply(cane_types, function(cane) unique(df$Box[df$Cane == cane]))

# list of white plus a bunch of colours
# white is used for empty spaces, because an empty space is "" and will be the first level of the factor
cell_colours <- c("#ffffff","#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3",
                  "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#bc80bd", "#ccebc5",
                  "#ffed6f", "#1f78b4", "#33a02c", "#e31a1c", "#ff7f00", "#6a3d9a",
                  "#b15928", "#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99",
                  "#e3e3e3", "#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3",
                  "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#bc80bd", "#ccebc5",
                  "#ffed6f", "#1f78b4", "#33a02c", "#e31a1c", "#ff7f00", "#6a3d9a",
                  "#b15928", "#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99")

# iterate through each of these canes and boxes, and plot the data
for (i in 1:length(cane_types)) {
  
  for (j in 1:length(box_numbers[[i]])) {
    # subset the data to get the info for this box
    sub_df <- df[df$Cane == cane_types[i] & df$Box == box_numbers[[i]][j], ]
    # generate the box data frame
    box_df <- boxmaker(sub_df)
    # get list of unique labels in alphabetical order
    unique_labels <- sort(unique(box_df$label))
    # print statement for debugging
    cat("Cane ",cane_types[i], " Box ", box_numbers[[i]][j], " has ", length(unique_labels), " unique labels\n")
    
    # create a new plot where a 10 x 10 table is displayed with 'label' column of box_df used to label each cell
    p <- ggplot(box_df, aes(x = (pos - 1) %% 10, y = 10 - (pos - 1) %/% 10)) +
      geom_tile(aes(fill = str_wrap(label,30)), colour = "black") +
      geom_text(aes(label = pos), size = 2) +
      scale_fill_manual(values = cell_colours) +
      theme_minimal() +
      theme(axis.text = element_blank(),
            axis.title = element_blank(),
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.border = element_blank(),
            panel.background = element_blank(),
            plot.margin = unit(c(3, 3, 3, 3), "cm"),
            legend.position = "bottom",
            legend.text = element_text(size = 5)) +
      ggtitle(paste0("Cane ",cane_types[i], "\nBox ", box_numbers[[i]][j])) +
      labs(fill = element_blank()) +
      guides(colour = guide_legend(nrow = 6, ncol = 4)) +
      coord_fixed()
    # add today's date to ggplot
    p <- p + annotate("text", x = 0, y = 0, label = Sys.Date(), hjust = 0, vjust = 0, size = 2)
    # save the plot, the name below has the cane type and box number padded to two digits
    ggsave(paste0("Output/Plots/cane_",cane_types[i], "_box_", sprintf("%02d", box_numbers[[i]][j]),".pdf"), p, width = 210, height = 297, bg = "white", units = "mm")
  }
}

The tricky bits with the code were:

  • dealing with the database data – the program updates the “position numbers” but users can too – so there is a variety of integers to parse. For example, this field may say “5, 6 7, 8-12 2”.
  • the ggplot object took some work:
    • the positions 1-100 need to be offset by 1 because graphing 0-99 is easier! So aes(x = (pos - 1) %% 10, y = 10 - (pos - 1) %/% 10)) is needed to array the labels correctly and so that the box is in the correct orientation.
    • fill = str_wrap(label,30) using {stringr} was needed to wrap the text nicely in the legend because some cell lines have very long names.
    • the challenge to output an A4 pdf of the map and legend with labels and date required quite a bit of experimentation. The theme() arguments plot.margin and legend.text were important, as was guides(colour = guide_legend(nrow = 6, ncol = 4)) to get everything nicely fitting on a page.

Otherwise this was a pretty quick way to visualise an inventory map for lab reagents.

The post title comes from “Bird’s Eye View” by Brendan Benson from his “One Mississippi” album released in 1996.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)