Bird’s Eye View: using R to generate inventory maps for lab reagents
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is a rather niche post, but the method can likely be adapted for other use cases.
In the lab we have many different cell lines stored in liquid nitrogen. The arrangement is:
- the vials are in specific positions in a box (10 x 10)
- there are 13 boxes to a cane
- we have 5 canes
Ideally, to retrieve the correct vial from the cell store requires a map.
We have a FileMaker Pro database of all of our cell lines which indicates their location and positions in the liquid nitrogen tank. We can view the positions in a box of one cell line at a time, but there isn’t a way (that I could find) to generate a map of each box, which can contain multiple cell lines.
The problem
We need to generate an inventory map for each box in our cell store. Can we do this in R?
The solution
It is possible to export the database contents from FileMaker Pro in a format that R can read. For some reason the csv, tsv and xml options generated files that were difficult to parse so I went with xlsx, which can be read into R using readxl::read_excel()
The next challenge was to make sense of the data and generate data frames that corresponded to each box. Once that was done, it was a case of generating the maps using {ggplot2}
. See the code below for a walkthrough.
The final step was to collate the outputs into a single PDF. If you are a wizard at R markdown, you will hate my “outside of R” solution. A one-liner using gs
in the terminal gave me a printable inventory of all our cell lines!
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf cane*.pdf
The code
library(ggplot2) library(stringr) ## function ---- # this function makes a dataframe of 100 rows, each with a position and a label - it represents a box with 10 rows and 10 columns boxmaker <- function(df) { box <- data.frame(pos = rep(1:100), label = "") # the column 'Box Position' in df has a string which is numbers separated either by "\r" or " " or "," they may also be a range, e.g. "2-9" # for each row, make this string into a vector of numbers and keep the 'Name' value tied with each for(i in 1:nrow(df)) { box_positions <- unlist(strsplit(as.character(df$`Box Position`[i]), "[\r, ]")) # for each of these positions, set the label in the box data frame to the string in the 'Name' column of df for (j in 1:length(box_positions)) { if (grepl("-", box_positions[j])) { range <- as.numeric(unlist(strsplit(box_positions[j], "-"))) box$label[range[1]:range[2]] <- df$Name[i] } else { box$label[as.numeric(box_positions[j])] <- df$Name[i] } } } return(box) } ## script ---- # read the data in df <- readxl::read_excel("Data/untitled.xlsx") # remove any rows with NA in Cane, Box or Box Position df <- df[!is.na(df$Cane) & !is.na(df$Box) & !is.na(df$`Box Position`), ] # also remove any where box number is greater than 13 df <- df[df$Box <= 13, ] # determine how many Cane types there are and get a list of them cane_types <- unique(df$Cane) # for each Cane, determine which Box numbers we have box_numbers <- lapply(cane_types, function(cane) unique(df$Box[df$Cane == cane])) # list of white plus a bunch of colours # white is used for empty spaces, because an empty space is "" and will be the first level of the factor cell_colours <- c("#ffffff","#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3", "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#bc80bd", "#ccebc5", "#ffed6f", "#1f78b4", "#33a02c", "#e31a1c", "#ff7f00", "#6a3d9a", "#b15928", "#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99", "#e3e3e3", "#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3", "#fdb462", "#b3de69", "#fccde5", "#d9d9d9", "#bc80bd", "#ccebc5", "#ffed6f", "#1f78b4", "#33a02c", "#e31a1c", "#ff7f00", "#6a3d9a", "#b15928", "#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99") # iterate through each of these canes and boxes, and plot the data for (i in 1:length(cane_types)) { for (j in 1:length(box_numbers[[i]])) { # subset the data to get the info for this box sub_df <- df[df$Cane == cane_types[i] & df$Box == box_numbers[[i]][j], ] # generate the box data frame box_df <- boxmaker(sub_df) # get list of unique labels in alphabetical order unique_labels <- sort(unique(box_df$label)) # print statement for debugging cat("Cane ",cane_types[i], " Box ", box_numbers[[i]][j], " has ", length(unique_labels), " unique labels\n") # create a new plot where a 10 x 10 table is displayed with 'label' column of box_df used to label each cell p <- ggplot(box_df, aes(x = (pos - 1) %% 10, y = 10 - (pos - 1) %/% 10)) + geom_tile(aes(fill = str_wrap(label,30)), colour = "black") + geom_text(aes(label = pos), size = 2) + scale_fill_manual(values = cell_colours) + theme_minimal() + theme(axis.text = element_blank(), axis.title = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(), plot.margin = unit(c(3, 3, 3, 3), "cm"), legend.position = "bottom", legend.text = element_text(size = 5)) + ggtitle(paste0("Cane ",cane_types[i], "\nBox ", box_numbers[[i]][j])) + labs(fill = element_blank()) + guides(colour = guide_legend(nrow = 6, ncol = 4)) + coord_fixed() # add today's date to ggplot p <- p + annotate("text", x = 0, y = 0, label = Sys.Date(), hjust = 0, vjust = 0, size = 2) # save the plot, the name below has the cane type and box number padded to two digits ggsave(paste0("Output/Plots/cane_",cane_types[i], "_box_", sprintf("%02d", box_numbers[[i]][j]),".pdf"), p, width = 210, height = 297, bg = "white", units = "mm") } }
The tricky bits with the code were:
- dealing with the database data – the program updates the “position numbers” but users can too – so there is a variety of integers to parse. For example, this field may say “5, 6 7, 8-12 2”.
- the ggplot object took some work:
- the positions 1-100 need to be offset by 1 because graphing 0-99 is easier! So
aes(x = (pos - 1) %% 10, y = 10 - (pos - 1) %/% 10))
is needed to array the labels correctly and so that the box is in the correct orientation. fill = str_wrap(label,30)
using{stringr}
was needed to wrap the text nicely in the legend because some cell lines have very long names.- the challenge to output an A4 pdf of the map and legend with labels and date required quite a bit of experimentation. The
theme()
argumentsplot.margin
andlegend.text
were important, as wasguides(colour = guide_legend(nrow = 6, ncol = 4))
to get everything nicely fitting on a page.
- the positions 1-100 need to be offset by 1 because graphing 0-99 is easier! So
Otherwise this was a pretty quick way to visualise an inventory map for lab reagents.
—
The post title comes from “Bird’s Eye View” by Brendan Benson from his “One Mississippi” album released in 1996.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.