I’ve been asked a few times how to make a so-called volcano plot from gene expression results. A volcano plot typically plots some measure of effect on the x-axis (typically the fold change) and the statistical significance on the y-axis (typically the -log10 of the p-value). Genes that are highly dysregulated are farther to the left and right sides, while highly significant changes appear higher on the plot.
I’ve analyzed some data from GEO (GSE52202) using RNA-seq to study gene expression in motor neurons differentiated from induced pluripotent stem cells (iPSCs) derived from ALS patients carrying the C9ORF72 repeat expansion. I aligned the data, counted with featureCounts, and analyzed with DESeq2. I uploaded the results to this GitHub Gist.
Here’s how you can use R to create a simple volcano plot. First, you’ll need to install the devtools package so that you can install my Tmisc package directly from GitHub (I haven’t submitted it to CRAN). There’s a function in Tmisc called read.gist(), which reads data directly from Github Gists by specifying the GitHub Gist ID (be careful with this…).
After reading in the data from GitHub the next section creates a basic volcano plot. A few more lines color the points based on their fold change and statistical significance. Finally, if you have the calibrate package installed, the last line labels a few genes of interest.