Let It Flow: recreating a FACS plot with ggplot

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It’s plot recreation time! In this post, we’ll look at how we can recreate a plot in R. I thought it might be useful to provide the solution but also to detail the process I went through to get there.

We have a FACS plot taken from a BD FACS Aria machine:

Briefly, it’s a histogram of the fluorescence levels of 10000 cells from three different sources. The fluorescence is excited by a 488 nm laser and is collected at 515-545 nm, i.e. it’s green fluorescence.

Why do we need to recreate the plot?

This plot was in the Supplementary Information of our preprint (Fig S2).

https://doi.org/10.1101/2024.05.31.596797

The journal requires all numeric data in the plots to be made available. Since the plot was generated in software that I didn’t have access to, it needed to be recreated from the raw data and then the numeric data could be made available.

What do we need to do to recreate it?

Just looking at the plot, what difficulties do we anticipate?

  • It’s a histogram on a log scale – so bin sizes might be problematic
  • We need to do line representation and fill – I don’t know how to do this
  • We have the raw data, but we will need to verify the count – is all of it plotted or only some?

The process

You can skip to the final result if you just want to know how to generate a plot like this.

If you want to code along, the data is available here.

First we’ll use the Bioconductor package {flowCore} to load in the raw .fcs files.

# if (!require("BiocManager", quietly = TRUE))
#   install.packages("BiocManager")
# BiocManager::install("flowCore")
# BiocManager::install("flowWorkspace")
# BiocManager::install("ggcyto")

library(flowCore)
library(flowWorkspace)
library(ggplot2)
library(ggcyto)
library(data.table)
library(dplyr)
library(cowplot)

# list the fcs files and read them into a flowSet
fcs_files <- list.files(path = "Data",pattern = ".fcs", full.names = TRUE)
fs <- read.flowSet(files = unlist(fcs_files))
# wrangle the flowSet into a data frame
tflow <- fsApply(fs, exprs, simplify = F)  
tflow <- lapply(tflow, as.data.frame)  
tflow <- rbindlist(tflow, idcol = "file")
tflow <- as.data.frame(tflow)

# filter for "clon 5" "pool" and "negative" the three cell sources we're interested in
tflow <- tflow %>%
  filter(grepl("clon 5", file) | grepl("pool", file) | grepl("negative", file))

We can see that this data frame is 30000 rows and that each file had 10000 rows (if we look a bit closer), which means that all the data were plotted originally. We just have to plot the data ourselves. One concern checked off the list.

In terms of data, to make the plot, we only need the B488-530_30-A column of numerical values and the file column which indicates the cell source.

Now let’s have a first pass at a plot.

# ggplot of "B488-530_30-A" as histogram on a logicle scale
ggplot(tflow, aes(x = `B488-530/30-A`, colour = file)) +
  geom_density() +
  scale_x_logicle() +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot1.png", width = 8, height = 4, dpi = 300, bg = "white")

The shape of the data looks approximately correct. I used geom_density() to get a line profile but it’s too smoothed to generate the plot we need. I used the “logicle” scale from {ggcyto} but we only need a regular log scale.

Let’s switch to a histogram:

# ggplot of "B488-530_30-A" as histogram on a log scale
ggplot(tflow, aes(x = `B488-530/30-A`, colour = file)) +
  geom_histogram() +
  scale_x_log10() +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot2.png", width = 8, height = 4, dpi = 300, bg = "white")

The first thing is that ggplot understands that the scaling of the x axis means the histogram bins need to be scaled. However this command generates some error messages due to the transformation; likely due to 0 values in the data.

The idea of ggplot is to let the package do all the work, but sometimes it’s better to refine what goes in. So we’ll transform the values manually. We can then plot the transformed values and sort out the x-axis labelling leter. While we are at it, let’s use the colours from the FACS Aria plot as well.

# log transform B488-530/30-A to get a plot like FACSAria
tflow$trans <- log10(tflow$`B488-530/30-A` + 1)
# colours from the FACSAria output
my_colours <- c(rgb(2, 85, 170, maxColorValue = 255),
                rgb(255, 51, 52, maxColorValue = 255),
                rgb(131, 130, 1, maxColorValue = 255))
# plot and refine
ggplot(tflow, aes(x = trans, colour = file)) +
  geom_histogram(fill = "white") +
  scale_colour_manual(values = my_colours) +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot3.png", width = 8, height = 4, dpi = 300, bg = "white")
ggplot(tflow, aes(x = trans, colour = file)) +
  geom_histogram(binwidth = 0.1, fill = "white") +
  scale_colour_manual(values = my_colours) +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot4.png", width = 8, height = 4, dpi = 300, bg = "white")
ggplot(tflow, aes(x = trans, colour = file)) +
  geom_histogram(binwidth = 0.01, fill = "white") +
  scale_colour_manual(values = my_colours) +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot5.png", width = 8, height = 4, dpi = 300, bg = "white")
ggplot(tflow, aes(x = trans, colour = file)) +
  geom_histogram(binwidth = 0.01, position = "identity", fill = "white") +
  scale_colour_manual(values = my_colours) +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot6.png", width = 8, height = 4, dpi = 300, bg = "white")
ggplot(tflow, aes(x = trans, colour = file)) +
  geom_histogram(binwidth = 0.02, position = "identity", fill = "white") +
  scale_colour_manual(values = my_colours) +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot7.png", width = 8, height = 4, dpi = 300, bg = "white")

By varying the binwidth (bearing in mind that this is in transformed units), we can get closer to the look of the original plot. Using position = "identity" means we lose the default “stacked histogram” behaviour and the counts are close to the original which tells us that the binwidth is refined. The axis is still borked, but we’ll fix that later.

We need to get rid of the borders of each bin in the histogram, but we need the line on the top. We can achieve this effect by using geom_step which gives a cityscape plot, and combine that with geom_histogram to do the fill only.

# try to lose the histogram look
ggplot(tflow, aes(x = trans, colour = file, fill = file)) +
  geom_histogram(binwidth = 0.02, position = "identity", colour = NA, alpha = 0.3) +
  scale_fill_manual(values = my_colours) +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot8.png", width = 8, height = 4, dpi = 300, bg = "white")
ggplot(tflow, aes(x = trans, colour = file, fill = file)) +
  geom_step(stat = "bin", binwidth = 0.02, direction = "mid") +
  geom_histogram(binwidth = 0.02, position = "identity", colour = NA, alpha = 0.3) +
  scale_fill_manual(values = my_colours) +
  scale_colour_manual(values = my_colours) +
  labs(x = "B488-530_30-A", y = "Count") +
  theme_cowplot(9)
ggsave("Output/Plots/plot9.png", width = 8, height = 4, dpi = 300, bg = "white")

This is close… but it’s still got the “blocky” look to it. So this is the end of the line for this approach. We need to change tack.

By looking at the tidyverse manual for geom_step “connecting observations“, I can see that geom_line is part of the same family. So let’s do this:

# use geom_line 
ggplot(tflow, aes(x = trans, colour = file, fill = file)) +
  geom_line(stat = "bin", binwidth = 0.02, direction = "mid") +
  geom_histogram(binwidth = 0.02, position = "identity", colour = NA, alpha = 0.3) +
  labs(x = "B488-530_30-A", y = "Count") +
  scale_colour_manual(values = my_colours) +
  scale_fill_manual(values = my_colours) +
  theme_cowplot(9)
ggsave("Output/Plots/plot10.png", width = 8, height = 4, dpi = 300, bg = "white")
# use geom_area
ggplot(tflow, aes(x = trans, colour = file, fill = file)) +
  geom_line(stat = "bin", binwidth = 0.02, direction = "mid") +
  geom_area(stat = "bin", binwidth = 0.02, position = "identity", colour = NA, alpha = 0.3) +
  labs(x = "B488-530_30-A", y = "Count") +
  scale_colour_manual(values = my_colours) +
  scale_fill_manual(values = my_colours) +
  theme_cowplot(9)
ggsave("Output/Plots/plot11.png", width = 8, height = 4, dpi = 300, bg = "white")
# tweak binwidth to match original
ggplot(tflow, aes(x = trans, colour = file, fill = file)) +
  geom_line(stat = "bin", binwidth = 0.018, direction = "mid") +
  geom_area(stat = "bin", binwidth = 0.018, position = "identity", colour = NA, alpha = 0.3) +
  labs(x = "B488-530_30-A", y = "Count") +
  scale_colour_manual(values = my_colours) +
  scale_fill_manual(values = my_colours) +
  theme_cowplot(9)
ggsave("Output/Plots/plot12.png", width = 8, height = 4, dpi = 300, bg = "white")
# now fix the axis
ggplot(tflow, aes(x = trans, colour = file, fill = file)) +
  geom_line(stat = "bin", binwidth = 0.018, direction = "mid") +
  geom_area(stat = "bin", binwidth = 0.018, position = "identity", colour = NA, alpha = 0.3) +
  labs(x = "B488-530_30-A", y = "Count") +
  scale_x_continuous(limits = c(0.6,5), labels = scales::label_math(10^.x)) +
  scale_colour_manual(values = my_colours) +
  scale_fill_manual(values = my_colours) +
  theme_cowplot(9) +
  annotation_logticks(sides = "b")
ggsave("Output/Plots/plot13.png", width = 8, height = 4, dpi = 300, bg = "white")

Replacing geom_step with geom_line worked right away. Replacing geom_histogram with geom_area means we get the nice fill effect and finally get rid of the blocky look. We also tweaked the binwidth to match the original and finally fixed the axis.

To make it more presentable, we’ll change the colours (since they didn’t make much sense in the original plot). We’ll move the legend inside the plot just to make scaling the final thing easier to insert to into the figure.

# the original colours weren't great, so
my_colours <- c("#00a651", "#3f3f3f", "#2276b9")
# plot the log transformed data as a histogram with binwidth of 0.018
ggplot(tflow, aes(x = trans, colour = file, fill = file)) +
  geom_line(stat = "bin", binwidth = 0.018, direction = "mid") +
  geom_area(stat = "bin", binwidth = 0.018, position = "identity", colour = NA, alpha = 0.3) +
  # geom_histogram(binwidth = 0.018, position = "identity", colour = NA, alpha = 0.3) +
  labs(x = "B488-530_30-A", y = "Count") +
  scale_x_continuous(limits = c(0.6,5), labels = scales::label_math(10^.x)) +
  scale_colour_manual(values = my_colours) +
  scale_fill_manual(values = my_colours) +
  theme_cowplot(9) +
  # place legend inside the plot, top left
  theme(legend.position = c(0.1, 0.9),
        legend.background = element_rect(fill = "transparent"),
        legend.key = element_rect(fill = "transparent", colour = "transparent")) +
  annotation_logticks(sides = "b")
# save the plot
ggsave("Output/Plots/plot14.png", width = 8, height = 4, dpi = 300, bg = "white")

From here we can export at the correct size and slot it into the figure.

ggsave("Output/Plots/plot15.png", last_plot() + theme(legend.position = "none"), width = 58, height = 58, dpi = 300, bg = "white", units = "mm")

Finally, if you remember, the journal wanted the numerical data. We can save it like this:

## Export results ----
# if Output/Data directory does not exist, create it
if (!dir.exists("Output/Data")) {
  dir.create("Output/Data", recursive = TRUE)
}
# extract Protein and IntDen columns to a new dataframe and save as csv
results <- tflow %>%
  select(file, `B488-530/30-A`, trans) %>%
  mutate(file = gsub(".fcs", "", file)) %>%
  mutate(file = gsub("Specimen_001_", "", file))

write.csv(results, "Output/Data/S2B.csv", row.names = FALSE)

How did we do?

At the risk of “marking my own homework”, let’s see how we did.

  • We verified the raw data-to-plot pipeline and produced the numerical data required by the journal
  • We produced a plot which is in the same style as the original:
    • The y-axis scale and the shape of the lines is similar
    • We achieved the same look with geom_line and geom_area
    • x-axis scaling is correct
  • The goal was not to reproduce the plot exactly:
    • We improved the colour scheme
    • The x-axis ticks are facing the wrong way (I believe this is because external ticks are clipped by {cowplot})
    • No y-axis minor ticks
    • No box or mirrored axes
    • Legend will need modifying later in the figure

The post title comes from “Let It Flow” by Spiritualized, taken from their Pure Phase album.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)