Gviz is a really great package for visualising genomics data in R. Recently I have been looking at stranded RNA-seq data, which provides the ability to differentiate sense and antisense expression from a genomic locus thanks to the way in which you generate the libraries (I won’t go into all that here). Most aligners are strand aware so retain this useful information, but there aren’t many (any?) well defined approaches for detecting antisense expression nor for visualising it (in R). So, here I set out to use Gviz for the visualisation of stranded RNA-seq data.
A quick google got me to this post on the BioC mailing list from the Gviz author in which he provides a function to separate the reads in a BAM based on strand. This is not the same thing as working with stranded data – stranded data still has reads aligning to both strands, it’s the orientation of the pairs that determines which strand of the genome was being expressed. But, the post was a useful starter.
To get Gviz to plot stranded data we have to define a new import function to pass to the DataTrack constructor as the default pays no heed to strand. The function requires a path to a BAM file (with index in same directory) and a GRanges object that provides the location in the BAM file we are interested in. We use Rsamtools to read and parse the BAM file for the reads, setting specific flags that assess the orientation of each read and separate them accordingly. For the library I have, the forward/top/5′-3′ strand has reads in the orientation F2R1 and the reverse/bottom/3′-5′ strand is F1R2. For the former, this means the first read of the pair is always on the bottom/reverse of the double stranded template that was sequenced (not the genome!). So to get all reads from RNA produced by the forward/top/5′-3′ (F2R1 orientation) we scan the BAM file twice with the following flags:
# For the F2:
scanBamFlag(isUnmappedQuery = FALSE, isProperPair = TRUE, isFirstMateRead = FALSE, isMinusStrand = FALSE)
# For the R1:
scanBamFlag(isUnmappedQuery = FALSE, isProperPair = TRUE, isFirstMateRead = TRUE, isMinusStrand = TRUE)
We then combine these such that all pairs of reads originating from the forward/top/5′-3′ strand are together in one GRanges. This is repeated for the reverse/bottom/3′-5′ strand with reads in the F1R2 orientation. We then calculate coverage over the region of interest and return a GRanges object with this data that is used by the dataTrack in Gviz. The full code for this is in this GitHub gist.
We use this as a custom importFunction in a DataTrack constructor, as follows:
mart = useMart("ensembl",dataset="rnorvegicus_gene_ensembl")
myChr = 2
myStart = 230947473
myEnd = 230958234
# First create track with gene model:
biomTrack = BiomartGeneRegionTrack(genome="rn5", biomart=mart, chromosome=myChr, start=myStart, end=myEnd,showId=F, geneSymbols=F, rotate.title=TRUE, col.line=NULL, col="orange", fill="orange",filters=list(biotype="protein_coding"),collapseTranscripts=FALSE)
# now create data track using new import function:
#bamFile = "path/to/bam/file"
dataTrack = DataTrack(bamFile, genome="rn5", chromosome=myChr, importFunction=strandedBamImport, stream=TRUE, legend=TRUE, col=c("cornflowerblue","purple"), groups=c("Forward","Reverse"))
# now plot tracks:
plotTracks(list(dataTrack,biomTrack), from = myStart, to = myEnd, type="hist", col.histogram=NA, cex.title=1, cex.axis=1, title.width=1.2)