align genomic features with phylogenetic tree

November 15, 2016
By

(This article was first published on R on Guangchuang YU, and kindly contributed to R-bloggers)

A question on biostars asking how to generate the following figure:

This can be quite easy to implement in ggtree, I can write a geom layer to layout the alignment. As ggbio already provides many geom for genomic data and I don’t want to re-invent the wheel, I decided to try ggtree+ggbio. This is also the beauty of R that packages complete each others.

Here is a sample data presented in ggbio vignette:

library(ggbio)
library(biovizBase) 
library(Homo.sapiens)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

data(genesymbol, package = "biovizBase")
wh <- genesymbol[c("BRCA1", "NBR1")]
wh <- range(wh, ignore.strand = TRUE)

gr.txdb <- crunch(txdb, which = wh)
## change column to  model 
colnames(values(gr.txdb))[4] <- "model"
grl <- split(gr.txdb, gr.txdb$tx_id)
set.seed(2016-10-25)
names(grl) <- sample(LETTERS, size = length(grl), replace = TRUE)
> print(grl)
GRangesList object of length 32:
$J 
GRanges object with 7 ranges and 4 metadata columns:
      seqnames               ranges strand |    tx_id    tx_name
                        |    
  [1]    chr17 [41277600, 41277787]      + |    61241 uc002idf.3
  [2]    chr17 [41283225, 41283287]      + |    61241 uc002idf.3
  [3]    chr17 [41284973, 41285154]      + |    61241 uc002idf.3
  [4]    chr17 [41290674, 41292342]      + |    61241 uc002idf.3
  [5]    chr17 [41277788, 41283224]      * |    61241 uc002idf.3
  [6]    chr17 [41283288, 41284972]      * |    61241 uc002idf.3
  [7]    chr17 [41285155, 41290673]      * |    61241 uc002idf.3
       gene_id    model
       
  [1]    10230     exon
  [2]    10230     exon
  [3]    10230     exon
  [4]    10230     exon
  [5]    10230      gap
  [6]    10230      gap
  [7]    10230      gap

$U 
GRanges object with 3 ranges and 4 metadata columns:
      seqnames               ranges strand | tx_id    tx_name
  [1]    chr17 [41277600, 41277787]      + | 61242 uc010czb.2
  [2]    chr17 [41290674, 41292342]      + | 61242 uc010czb.2
  [3]    chr17 [41277788, 41290673]      * | 61242 uc010czb.2
      gene_id model
  [1]   10230  exon
  [2]   10230  exon
  [3]   10230   gap

$D 
GRanges object with 9 ranges and 4 metadata columns:
      seqnames               ranges strand | tx_id    tx_name
  [1]    chr17 [41277600, 41277787]      + | 61243 uc002idg.3
  [2]    chr17 [41283225, 41283287]      + | 61243 uc002idg.3
  [3]    chr17 [41290674, 41290939]      + | 61243 uc002idg.3
  [4]    chr17 [41291833, 41292300]      + | 61243 uc002idg.3
  [5]    chr17 [41296745, 41297125]      + | 61243 uc002idg.3
  [6]    chr17 [41277788, 41283224]      * | 61243 uc002idg.3
  [7]    chr17 [41283288, 41290673]      * | 61243 uc002idg.3
  [8]    chr17 [41290940, 41291832]      * | 61243 uc002idg.3
  [9]    chr17 [41292301, 41296744]      * | 61243 uc002idg.3
      gene_id model
  [1]   10230  exon
  [2]   10230  exon
  [3]   10230  exon
  [4]   10230  exon
  [5]   10230  exon
  [6]   10230   gap
  [7]   10230   gap
  [8]   10230   gap
  [9]   10230   gap

...
<29 more elements>
-------
seqinfo: 1 sequence from hg19 genome

We can visualize the alignment simply using:

ggplot() + geom_alignment(grl, alpha=.5)

The input data for geom_alignment is a GRangesList object, while facet_plot defined in ggtree expect the input data as a data.frame. I extend the facet_plot to work with geom_alignment. In doing this, I find a bug of the geom_alignment function and send a patch to Michael. My patch was incorporated in ggbio 1.23.2.

With the updates of both ggtree and ggbio, we can use facet_plot to align alignment with phylogenetic tree.

Suppose we have the following tree:

library(ggtree)
n <- names(grl) %>% unique %>% length
set.seed(2016-10-25)
tr = rtree(n)
set.seed(2016-10-25)
tr$tip.label = sample(unique(names(grl)), n)

p <- ggtree(tr) + geom_tiplab()

It is quite easy to use facet_plot to visualize the above alignment with this tree:

facet_plot(p, 'alignment', grl, geom_alignment, inherit.aes=FALSE, mapping=aes())

Beware of mapping=aes() is required as ggbio can’t accept mapping=NULL.

PS: I only test geom_alignment which works with GRanges and GRangesList. Other geoms defined in ggbio may not be supported. If you find facet_plot fail to work with other geoms, please open an issue in github. Feature requests are welcome.

Citation

G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. doi:10.1111/2041-210X.12628.

To leave a comment for the author, please follow the link and comment on their blog: R on Guangchuang YU.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)