multiple annotation in ChIPseeker

October 1, 2014
By

(This article was first published on YGC » R, and kindly contributed to R-bloggers)

Nearest gene annotation

Almost all annotation software calculate the distance of a peak to the nearest TSS and assign the peak to that gene. This can be misleading, as binding sites might be located between two start sites of different genes or hit different genes which have the same TSS location in the genome.

The function annotatePeak provides option to assign genes with a max distance cutoff and all genes within this distance were reported for each peak.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
> anno = annotatePeak(getSampleFiles()[[4]], addFlankGeneInfo=TRUE, flankDistance=5000)
>> loading peak file...				 2014-09-30 01:12:09 PM 
>> preparing features information...		 2014-09-30 01:12:09 PM 
>> identifying nearest features...		 2014-09-30 01:12:10 PM 
>> calculating distance from peak to TSS...	 2014-09-30 01:12:11 PM 
>> assigning genomic annotation...		 2014-09-30 01:12:11 PM 
>> adding flank feature information from peaks...2014-09-30 01:12:37 PM 
>> assigning chromosome lengths			 2014-09-30 01:12:39 PM 
>> done...					 2014-09-30 01:12:39 PM 
> head(as.GRanges(anno))
GRanges object with 6 ranges and 14 metadata columns:
      seqnames             ranges strand |          V4        V5
                      |     
  [1]     chr1 [ 815092,  817883]      * | MACS_peak_1    295.76
  [2]     chr1 [1243287, 1244338]      * | MACS_peak_2     63.19
  [3]     chr1 [2979976, 2981228]      * | MACS_peak_3    100.16
  [4]     chr1 [3566181, 3567876]      * | MACS_peak_4    558.89
  [5]     chr1 [3816545, 3818111]      * | MACS_peak_5     57.57
  [6]     chr1 [6304864, 6305704]      * | MACS_peak_6     54.57
             annotation  geneChr geneStart   geneEnd geneLength geneStrand
 
  [1] Distal Intergenic     chr1    803451    812182       8732          -
  [2]  Promoter (<=1kb)     chr1   1243994   1247057       3064          +
  [3]  Promoter (<=1kb)     chr1   2976181   2980350       4170          -
  [4]  Promoter (<=1kb)     chr1   3547331   3566671      19341          -
  [5]  Promoter (<=1kb)     chr1   3816968   3832011      15044          +
  [6]  Promoter (1-2kb)     chr1   6304252   6305638       1387          +
           geneId transcriptId distanceToTSS
 
  [1]      284593   uc001abt.4         -5701
  [2]      126789   uc001aed.3             0
  [3]      440556   uc001aka.3             0
  [4]       49856   uc001ako.3             0
  [5]   100133612   uc001alg.3             0
  [6]      390992   uc009vly.2          1452
                                                                                                                                                               flank_txIds
 
  [1]                                                                                                                                                           uc001abt.4
  [2] uc001aea.2;uc001aeb.2;uc001aec.1;uc001aed.3;uc010nyi.2;uc009vjx.3;uc009vjy.1;uc001aee.2;uc001aef.2;uc001aeg.2;uc001aeh.2;uc001aei.2;uc001aek.2;uc009vjz.2;uc010nyj.2
  [3]                                                                                                    uc001aka.3;uc010nzg.1;uc001akc.3;uc001ake.3;uc001akf.3;uc009vlh.3
  [4]                                                                   uc001ako.3;uc001akn.3;uc010nzi.2;uc001akp.3;uc021ofb.1;uc021ofc.1;uc021ofd.1;uc021ofe.1;uc021off.1
  [5]                                                                                                    uc001alg.3;uc010nzo.2;uc001alh.4;uc001alf.3;uc009vls.3;uc031pkz.1
  [6]                                                                                                                                                uc001amp.2;uc009vly.2
                                                                                        flank_geneIds
 
  [1]                                                                                          284593
  [2] 116983;116983;116983;126789;126789;126789;54973;54973;54973;54973;54973;54973;54973;54973;54973
  [3]                                                           440556;440556;63976;63976;63976;63976
  [4]                                                 49856;49856;49856;7161;7161;7161;7161;7161;7161
  [5]                                              100133612;100133612;100133612;339448;339448;339448
  [6]                                                                                   387509;390992
                                                           flank_gene_distances
 
  [1]                                                                     -2910
  [2] -1978;-18;0;0;0;-128;8492;15729;15729;15729;15729;15729;15729;15729;15729
  [3]                                               0;0;-4514;-4514;-4514;-4514
  [4]                                 0;0;0;-1253;-1253;-1253;-1253;-1253;-1253
  [5]                                                               0;0;0;0;0;0
  [6]                                                                   15331;0
  -------
  seqinfo: 24 sequences from hg19 genome

Three extra columns, flank_txIds, flank_geneIds, flank_gene_distances were added if parameter addFlankGeneInfo is set to TRUE.

Genomic annotation

The annotation column annotates the genomic location of a given peak. Since some annotations overlap, they are assigned based on the following priority.

  • Promoter (defined by tssRegion parameter)
  • 5′ UTR
  • 3′ UTR
  • Exon
  • Intron
  • Downstream (immediate downstream of a gene, within 3kb)
  • Distal Intergenic

I have implemented plotAnnoPie and plotAnnoBar to visualize their distribution. I am wondering how to visualize the annotation overlap and found venn pie-agram can partially solve this issue. It’s hard/impossible to visualize all the overlap information within one vennpie plot.

I employed the vennpie to ChIPseeker, and now with the vennpie function, we can visualize some of the overlap annotations.

vennpie(anno)

2014-09-30-142735_471x303_scrot

In order to manage complex data, annotatePeak now output a S4 object. Users can use as.GRanges to convert it to as.GRanges object or as.data.frame to convert it to data.frame and then the output can be exported by write.table function.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)