Visualising stranded RNA-seq data with Gviz/Bioconductor

Posted on November 6, 2014 by sidderb in R bloggers | 0 Comments

[This article was first published on Bioinpharmatics » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Gviz is a really great package for visualising genomics data in R. Recently I have been looking at stranded RNA-seq data, which provides the ability to differentiate sense and antisense expression from a genomic locus thanks to the way in which you generate the libraries (I won’t go into all that here). Most aligners are strand aware so retain this useful information, but there aren’t many (any?) well defined approaches for detecting antisense expression nor for visualising it (in R). So, here I set out to use Gviz for the visualisation of stranded RNA-seq data.

A quick google got me to this post on the BioC mailing list from the Gviz author in which he provides a function to separate the reads in a BAM based on strand. This is not the same thing as working with stranded data – stranded data still has reads aligning to both strands, it’s the orientation of the pairs that determines which strand of the genome was being expressed. But, the post was a useful starter.

To get Gviz to plot stranded data we have to define a new import function to pass to the DataTrack constructor as the default pays no heed to strand. The function requires a path to a BAM file (with index in same directory) and a GRanges object that provides the location in the BAM file we are interested in. We use Rsamtools to read and parse the BAM file for the reads, setting specific flags that assess the orientation of each read and separate them accordingly. For the library I have, the forward/top/5′-3′ strand has reads in the orientation F2R1 and the reverse/bottom/3′-5′ strand is F1R2. For the former, this means the first read of the pair is always on the bottom/reverse of the double stranded template that was sequenced (not the genome!). So to get all reads from RNA produced by the forward/top/5′-3′ (F2R1 orientation) we scan the BAM file twice with the following flags:

# For the F2: scanBamFlag(isUnmappedQuery = FALSE, isProperPair = TRUE, isFirstMateRead = FALSE, isMinusStrand = FALSE) # For the R1: scanBamFlag(isUnmappedQuery = FALSE, isProperPair = TRUE, isFirstMateRead = TRUE, isMinusStrand = TRUE)

We then combine these such that all pairs of reads originating from the forward/top/5′-3′ strand are together in one GRanges. This is repeated for the reverse/bottom/3′-5′ strand with reads in the F1R2 orientation. We then calculate coverage over the region of interest and return a GRanges object with this data that is used by the dataTrack in Gviz. The full code for this is in this GitHub gist.

We use this as a custom importFunction in a DataTrack constructor, as follows:

library(Gviz) options(ucscChromosomeNames=FALSE) library(biomaRt) mart = useMart("ensembl",dataset="rnorvegicus_gene_ensembl") myChr = 2 myStart = 230947473 myEnd = 230958234 # First create track with gene model: biomTrack = BiomartGeneRegionTrack(genome="rn5", biomart=mart, chromosome=myChr, start=myStart, end=myEnd,showId=F, geneSymbols=F, rotate.title=TRUE, col.line=NULL, col="orange", fill="orange",filters=list(biotype="protein_coding"),collapseTranscripts=FALSE) # now create data track using new import function: #bamFile = "path/to/bam/file" dataTrack = DataTrack(bamFile, genome="rn5", chromosome=myChr, importFunction=strandedBamImport, stream=TRUE, legend=TRUE, col=c("cornflowerblue","purple"), groups=c("Forward","Reverse")) # now plot tracks: plotTracks(list(dataTrack,biomTrack), from = myStart, to = myEnd, type="hist", col.histogram=NA, cex.title=1, cex.axis=1, title.width=1.2)

Strand specific coverage can be plotted in a Gviz data track using using a custom import function. The top track contains the per-base coverage with reads from the forward strand in blue and reads from the reverse strand in purple. The bottom track shows the genomic context of the reads with exons in thick lines, introns as thin lines as the direction of transcription indicated by the arrows on the think lines. Here the two genes are on opposite strands and are transcribed toward each other, making for a nice example.

To leave a comment for the author, please follow the link and comment on their blog: Bioinpharmatics » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Visualising stranded RNA-seq data with Gviz/Bioconductor

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)