align genomic features with phylogenetic tree

[This article was first published on R on Guangchuang YU, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A question on biostars asking how to generate the following figure:

This can be quite easy to implement in ggtree, I can write a geom layer to layout the alignment. As ggbio already provides many geom for genomic data and I don’t want to re-invent the wheel, I decided to try ggtree+ggbio. This is also the beauty of R that packages complete each others.

Here is a sample data presented in ggbio vignette:

library(ggbio)
library(biovizBase) 
library(Homo.sapiens)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

data(genesymbol, package = "biovizBase")
wh <- genesymbol[c("BRCA1", "NBR1")]
wh <- range(wh, ignore.strand = TRUE)

gr.txdb <- crunch(txdb, which = wh)
## change column to  model 
colnames(values(gr.txdb))[4] <- "model"
grl <- split(gr.txdb, gr.txdb$tx_id)
set.seed(2016-10-25)
names(grl) <- sample(LETTERS, size = length(grl), replace = TRUE)

> print(grl)
GRangesList object of length 32:
$J 
GRanges object with 7 ranges and 4 metadata columns:
      seqnames               ranges strand |    tx_id    tx_name
         <Rle>            <IRanges>  <Rle> | <factor>   <factor>
  [1]    chr17 [41277600, 41277787]      + |    61241 uc002idf.3
  [2]    chr17 [41283225, 41283287]      + |    61241 uc002idf.3
  [3]    chr17 [41284973, 41285154]      + |    61241 uc002idf.3
  [4]    chr17 [41290674, 41292342]      + |    61241 uc002idf.3
  [5]    chr17 [41277788, 41283224]      * |    61241 uc002idf.3
  [6]    chr17 [41283288, 41284972]      * |    61241 uc002idf.3
  [7]    chr17 [41285155, 41290673]      * |    61241 uc002idf.3
       gene_id    model
      <factor> <factor>
  [1]    10230     exon
  [2]    10230     exon
  [3]    10230     exon
  [4]    10230     exon
  [5]    10230      gap
  [6]    10230      gap
  [7]    10230      gap

$U 
GRanges object with 3 ranges and 4 metadata columns:
      seqnames               ranges strand | tx_id    tx_name
  [1]    chr17 [41277600, 41277787]      + | 61242 uc010czb.2
  [2]    chr17 [41290674, 41292342]      + | 61242 uc010czb.2
  [3]    chr17 [41277788, 41290673]      * | 61242 uc010czb.2
      gene_id model
  [1]   10230  exon
  [2]   10230  exon
  [3]   10230   gap

$D 
GRanges object with 9 ranges and 4 metadata columns:
      seqnames               ranges strand | tx_id    tx_name
  [1]    chr17 [41277600, 41277787]      + | 61243 uc002idg.3
  [2]    chr17 [41283225, 41283287]      + | 61243 uc002idg.3
  [3]    chr17 [41290674, 41290939]      + | 61243 uc002idg.3
  [4]    chr17 [41291833, 41292300]      + | 61243 uc002idg.3
  [5]    chr17 [41296745, 41297125]      + | 61243 uc002idg.3
  [6]    chr17 [41277788, 41283224]      * | 61243 uc002idg.3
  [7]    chr17 [41283288, 41290673]      * | 61243 uc002idg.3
  [8]    chr17 [41290940, 41291832]      * | 61243 uc002idg.3
  [9]    chr17 [41292301, 41296744]      * | 61243 uc002idg.3
      gene_id model
  [1]   10230  exon
  [2]   10230  exon
  [3]   10230  exon
  [4]   10230  exon
  [5]   10230  exon
  [6]   10230   gap
  [7]   10230   gap
  [8]   10230   gap
  [9]   10230   gap

...
<29 more elements>
-------
seqinfo: 1 sequence from hg19 genome

We can visualize the alignment simply using:

ggplot() + geom_alignment(grl, alpha=.5)

The input data for geom_alignment is a GRangesList object, while facet_plot defined in ggtree expect the input data as a data.frame. I extend the facet_plot to work with geom_alignment. In doing this, I find a bug of the geom_alignment function and send a patch to Michael. My patch was incorporated in ggbio 1.23.2.

With the updates of both ggtree and ggbio, we can use facet_plot to align alignment with phylogenetic tree.

Suppose we have the following tree:

library(ggtree)
n <- names(grl) %>% unique %>% length
set.seed(2016-10-25)
tr = rtree(n)
set.seed(2016-10-25)
tr$tip.label = sample(unique(names(grl)), n)

p <- ggtree(tr) + geom_tiplab()

It is quite easy to use facet_plot to visualize the above alignment with this tree:

facet_plot(p, 'alignment', grl, geom_alignment, inherit.aes=FALSE, mapping=aes())

Beware of mapping=aes() is required as ggbio can’t accept mapping=NULL.

PS: I only test geom_alignment which works with GRanges and GRangesList. Other geoms defined in ggbio may not be supported. If you find facet_plot fail to work with other geoms, please open an issue in github. Feature requests are welcome.

Citation

G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. doi:10.1111/2041-210X.12628.

To leave a comment for the author, please follow the link and comment on their blog: R on Guangchuang YU.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)