Let It Flow: recreating a FACS plot with ggplot
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It’s plot recreation time! In this post, we’ll look at how we can recreate a plot in R. I thought it might be useful to provide the solution but also to detail the process I went through to get there.
We have a FACS plot taken from a BD FACS Aria machine:

Briefly, it’s a histogram of the fluorescence levels of 10000 cells from three different sources. The fluorescence is excited by a 488 nm laser and is collected at 515-545 nm, i.e. it’s green fluorescence.
Why do we need to recreate the plot?
This plot was in the Supplementary Information of our preprint (Fig S2).
https://doi.org/10.1101/2024.05.31.596797
The journal requires all numeric data in the plots to be made available. Since the plot was generated in software that I didn’t have access to, it needed to be recreated from the raw data and then the numeric data could be made available.
What do we need to do to recreate it?
Just looking at the plot, what difficulties do we anticipate?
- It’s a histogram on a log scale – so bin sizes might be problematic
- We need to do line representation and fill – I don’t know how to do this
- We have the raw data, but we will need to verify the count – is all of it plotted or only some?
The process
You can skip to the final result if you just want to know how to generate a plot like this.
If you want to code along, the data is available here.
First we’ll use the Bioconductor package {flowCore}
to load in the raw .fcs
files.
# if (!require("BiocManager", quietly = TRUE)) # install.packages("BiocManager") # BiocManager::install("flowCore") # BiocManager::install("flowWorkspace") # BiocManager::install("ggcyto") library(flowCore) library(flowWorkspace) library(ggplot2) library(ggcyto) library(data.table) library(dplyr) library(cowplot) # list the fcs files and read them into a flowSet fcs_files <- list.files(path = "Data",pattern = ".fcs", full.names = TRUE) fs <- read.flowSet(files = unlist(fcs_files)) # wrangle the flowSet into a data frame tflow <- fsApply(fs, exprs, simplify = F) tflow <- lapply(tflow, as.data.frame) tflow <- rbindlist(tflow, idcol = "file") tflow <- as.data.frame(tflow) # filter for "clon 5" "pool" and "negative" the three cell sources we're interested in tflow <- tflow %>% filter(grepl("clon 5", file) | grepl("pool", file) | grepl("negative", file))
We can see that this data frame is 30000 rows and that each file had 10000 rows (if we look a bit closer), which means that all the data were plotted originally. We just have to plot the data ourselves. One concern checked off the list.
In terms of data, to make the plot, we only need the B488-530_30-A
column of numerical values and the file
column which indicates the cell source.
Now let’s have a first pass at a plot.
# ggplot of "B488-530_30-A" as histogram on a logicle scale ggplot(tflow, aes(x = `B488-530/30-A`, colour = file)) + geom_density() + scale_x_logicle() + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot1.png", width = 8, height = 4, dpi = 300, bg = "white")

The shape of the data looks approximately correct. I used geom_density()
to get a line profile but it’s too smoothed to generate the plot we need. I used the “logicle” scale from {ggcyto}
but we only need a regular log scale.
Let’s switch to a histogram:
# ggplot of "B488-530_30-A" as histogram on a log scale ggplot(tflow, aes(x = `B488-530/30-A`, colour = file)) + geom_histogram() + scale_x_log10() + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot2.png", width = 8, height = 4, dpi = 300, bg = "white")

The first thing is that ggplot understands that the scaling of the x axis means the histogram bins need to be scaled. However this command generates some error messages due to the transformation; likely due to 0 values in the data.
The idea of ggplot is to let the package do all the work, but sometimes it’s better to refine what goes in. So we’ll transform the values manually. We can then plot the transformed values and sort out the x-axis labelling leter. While we are at it, let’s use the colours from the FACS Aria plot as well.
# log transform B488-530/30-A to get a plot like FACSAria tflow$trans <- log10(tflow$`B488-530/30-A` + 1) # colours from the FACSAria output my_colours <- c(rgb(2, 85, 170, maxColorValue = 255), rgb(255, 51, 52, maxColorValue = 255), rgb(131, 130, 1, maxColorValue = 255)) # plot and refine ggplot(tflow, aes(x = trans, colour = file)) + geom_histogram(fill = "white") + scale_colour_manual(values = my_colours) + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot3.png", width = 8, height = 4, dpi = 300, bg = "white") ggplot(tflow, aes(x = trans, colour = file)) + geom_histogram(binwidth = 0.1, fill = "white") + scale_colour_manual(values = my_colours) + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot4.png", width = 8, height = 4, dpi = 300, bg = "white") ggplot(tflow, aes(x = trans, colour = file)) + geom_histogram(binwidth = 0.01, fill = "white") + scale_colour_manual(values = my_colours) + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot5.png", width = 8, height = 4, dpi = 300, bg = "white") ggplot(tflow, aes(x = trans, colour = file)) + geom_histogram(binwidth = 0.01, position = "identity", fill = "white") + scale_colour_manual(values = my_colours) + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot6.png", width = 8, height = 4, dpi = 300, bg = "white") ggplot(tflow, aes(x = trans, colour = file)) + geom_histogram(binwidth = 0.02, position = "identity", fill = "white") + scale_colour_manual(values = my_colours) + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot7.png", width = 8, height = 4, dpi = 300, bg = "white")





By varying the binwidth (bearing in mind that this is in transformed units), we can get closer to the look of the original plot. Using position = "identity"
means we lose the default “stacked histogram” behaviour and the counts are close to the original which tells us that the binwidth is refined. The axis is still borked, but we’ll fix that later.
We need to get rid of the borders of each bin in the histogram, but we need the line on the top. We can achieve this effect by using geom_step
which gives a cityscape plot, and combine that with geom_histogram
to do the fill only.
# try to lose the histogram look ggplot(tflow, aes(x = trans, colour = file, fill = file)) + geom_histogram(binwidth = 0.02, position = "identity", colour = NA, alpha = 0.3) + scale_fill_manual(values = my_colours) + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot8.png", width = 8, height = 4, dpi = 300, bg = "white") ggplot(tflow, aes(x = trans, colour = file, fill = file)) + geom_step(stat = "bin", binwidth = 0.02, direction = "mid") + geom_histogram(binwidth = 0.02, position = "identity", colour = NA, alpha = 0.3) + scale_fill_manual(values = my_colours) + scale_colour_manual(values = my_colours) + labs(x = "B488-530_30-A", y = "Count") + theme_cowplot(9) ggsave("Output/Plots/plot9.png", width = 8, height = 4, dpi = 300, bg = "white")


This is close… but it’s still got the “blocky” look to it. So this is the end of the line for this approach. We need to change tack.
By looking at the tidyverse manual for geom_step “connecting observations“, I can see that geom_line is part of the same family. So let’s do this:
# use geom_line ggplot(tflow, aes(x = trans, colour = file, fill = file)) + geom_line(stat = "bin", binwidth = 0.02, direction = "mid") + geom_histogram(binwidth = 0.02, position = "identity", colour = NA, alpha = 0.3) + labs(x = "B488-530_30-A", y = "Count") + scale_colour_manual(values = my_colours) + scale_fill_manual(values = my_colours) + theme_cowplot(9) ggsave("Output/Plots/plot10.png", width = 8, height = 4, dpi = 300, bg = "white") # use geom_area ggplot(tflow, aes(x = trans, colour = file, fill = file)) + geom_line(stat = "bin", binwidth = 0.02, direction = "mid") + geom_area(stat = "bin", binwidth = 0.02, position = "identity", colour = NA, alpha = 0.3) + labs(x = "B488-530_30-A", y = "Count") + scale_colour_manual(values = my_colours) + scale_fill_manual(values = my_colours) + theme_cowplot(9) ggsave("Output/Plots/plot11.png", width = 8, height = 4, dpi = 300, bg = "white") # tweak binwidth to match original ggplot(tflow, aes(x = trans, colour = file, fill = file)) + geom_line(stat = "bin", binwidth = 0.018, direction = "mid") + geom_area(stat = "bin", binwidth = 0.018, position = "identity", colour = NA, alpha = 0.3) + labs(x = "B488-530_30-A", y = "Count") + scale_colour_manual(values = my_colours) + scale_fill_manual(values = my_colours) + theme_cowplot(9) ggsave("Output/Plots/plot12.png", width = 8, height = 4, dpi = 300, bg = "white") # now fix the axis ggplot(tflow, aes(x = trans, colour = file, fill = file)) + geom_line(stat = "bin", binwidth = 0.018, direction = "mid") + geom_area(stat = "bin", binwidth = 0.018, position = "identity", colour = NA, alpha = 0.3) + labs(x = "B488-530_30-A", y = "Count") + scale_x_continuous(limits = c(0.6,5), labels = scales::label_math(10^.x)) + scale_colour_manual(values = my_colours) + scale_fill_manual(values = my_colours) + theme_cowplot(9) + annotation_logticks(sides = "b") ggsave("Output/Plots/plot13.png", width = 8, height = 4, dpi = 300, bg = "white")




Replacing geom_step
with geom_line
worked right away. Replacing geom_histogram
with geom_area
means we get the nice fill effect and finally get rid of the blocky look. We also tweaked the binwidth to match the original and finally fixed the axis.
To make it more presentable, we’ll change the colours (since they didn’t make much sense in the original plot). We’ll move the legend inside the plot just to make scaling the final thing easier to insert to into the figure.
# the original colours weren't great, so my_colours <- c("#00a651", "#3f3f3f", "#2276b9") # plot the log transformed data as a histogram with binwidth of 0.018 ggplot(tflow, aes(x = trans, colour = file, fill = file)) + geom_line(stat = "bin", binwidth = 0.018, direction = "mid") + geom_area(stat = "bin", binwidth = 0.018, position = "identity", colour = NA, alpha = 0.3) + # geom_histogram(binwidth = 0.018, position = "identity", colour = NA, alpha = 0.3) + labs(x = "B488-530_30-A", y = "Count") + scale_x_continuous(limits = c(0.6,5), labels = scales::label_math(10^.x)) + scale_colour_manual(values = my_colours) + scale_fill_manual(values = my_colours) + theme_cowplot(9) + # place legend inside the plot, top left theme(legend.position = c(0.1, 0.9), legend.background = element_rect(fill = "transparent"), legend.key = element_rect(fill = "transparent", colour = "transparent")) + annotation_logticks(sides = "b") # save the plot ggsave("Output/Plots/plot14.png", width = 8, height = 4, dpi = 300, bg = "white")

From here we can export at the correct size and slot it into the figure.
ggsave("Output/Plots/plot15.png", last_plot() + theme(legend.position = "none"), width = 58, height = 58, dpi = 300, bg = "white", units = "mm")

Finally, if you remember, the journal wanted the numerical data. We can save it like this:
## Export results ---- # if Output/Data directory does not exist, create it if (!dir.exists("Output/Data")) { dir.create("Output/Data", recursive = TRUE) } # extract Protein and IntDen columns to a new dataframe and save as csv results <- tflow %>% select(file, `B488-530/30-A`, trans) %>% mutate(file = gsub(".fcs", "", file)) %>% mutate(file = gsub("Specimen_001_", "", file)) write.csv(results, "Output/Data/S2B.csv", row.names = FALSE)
How did we do?
At the risk of “marking my own homework”, let’s see how we did.
- We verified the raw data-to-plot pipeline and produced the numerical data required by the journal
- We produced a plot which is in the same style as the original:
- The y-axis scale and the shape of the lines is similar
- We achieved the same look with geom_line and geom_area
- x-axis scaling is correct
- The goal was not to reproduce the plot exactly:
- We improved the colour scheme
- The x-axis ticks are facing the wrong way (I believe this is because external ticks are clipped by
{cowplot}
) - No y-axis minor ticks
- No box or mirrored axes
- Legend will need modifying later in the figure


—
The post title comes from “Let It Flow” by Spiritualized, taken from their Pure Phase album.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.