In an earlier post I discussed how you might approach producing Manhattan plots in R using ggplot2. Several readers had questions about the code samples in the post, and asked if I could post the complete code. I didn't think that would be very helpful if I couldn't also post the accompanying data, which I couldn't do. Now I have a version of the program that uses simulated data to demonstrate how to produce Manhattan plots. I simulated a genome of 3 billion base pairs with 30 chromosomes and 1,000 SNP per chromosome. Each SNP has an effect drawn from a t-distribution with 100 degrees-of-freedom.
#!/bin/bash exec /home/jcole/bin/R --vanilla -q --slave -e "source(file=pipe(\"tail -n +4 $0\"))" --args $@ #debug: exec R --vanilla --verbose -e "source(file=pipe(\"tail -n +4 $0\"))" --args $@ ### The above line starts R and then reads in this script, starting at line 4 ### (taken from the R Wiki at http://rwiki.sciviews.org/doku.php?id=tips:scriptingr). # simulated_marker_effects.r # # Use ggplot2 to produce Manhattan plots of simulated marker effects. # # Author: John B. Cole (firstname.lastname@example.org) # Changes: 07/20/2011 Original program library("ggplot2") library("RColorBrewer") # Create 30 chromosomes with 1,000 SNP per chromosome. This assumes that all chromosomes are the same # length and have the same number of SNP. chromosomes If you run this code, you should see something like the following plot: Many of you reading this probably know R much better than I do, so I welcome constructive comments on how the code can be improved. ggplot2 is an excellent library but can take a while to render as your datasets grow, so if you've got hundreds-of-thousands or millions of markers to plot, you've been warned. I'd also like to note that Stephen Turner has an article over at Getting Genetics Done on creating Manhattan plots using base R graphics that's worth checking out, too.