# ChIPseeker for ChIP peak annotation

April 13, 2014
By

(This article was first published on YGC » R, and kindly contributed to R-bloggers)

ChIPpeakAnno WAS the only one R package for ChIP peak annotation. I used it for annotating peak in my recent study.

I found it does not consider the strand information of genes. I reported the bug to the authors, but they are reluctant to change.

So I decided to develop my own package, ChIPseeker, and it's now available in Bioconductor.

?View Code RSPLUS
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62  > require(TxDb.Hsapiens.UCSC.hg19.knownGene) > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > require(ChIPseeker) > peakfile = system.file("extdata", "sample_peaks.txt", package="ChIPseeker") > peakfile [1] "/usr/local/Cellar/r/3.0.2/R.framework/Versions/3.0/Resources/library/ChIPseeker/extdata/sample_peaks.txt" > peakAnno <- annotatePeak(peakfile, tssRegion=c(-3000, 1000), TranscriptDb=txdb, annoDb="org.Hs.eg.db") >> loading peak file... >> preparing features information... >> identifying nearest features... >> calculating distance from peak to TSS... >> assigning genomic annotation... >> adding gene annotation... Loading required package: org.Hs.eg.db >> done... > head(peakAnno) GRanges with 6 ranges and 17 metadata columns: seqnames ranges strand | length summit tags | [1] chr10 [105137980, 105138593] * | 614 174 7 [2] chr10 [ 42644416, 42645383] * | 968 669 27 [3] chr6 [162188189, 162188742] * | 554 174 8 [4] chr4 [ 2307246, 2307622] * | 377 188 9 [5] chr13 [ 51791808, 51792240] * | 433 267 8 [6] chr19 [ 58731678, 58732653] * | 976 472 15 X.10.log10.pvalue. fold_enrichment FDR... annotation [1] 52.80 15.32 Exon (38885 exon 3 of 11) [2] 77.15 9.13 0.79 Intergenic [3] 59.88 17.82 Intron (27755 intron 5 of 6) [4] 85.25 26.62 1.03 Exon (18944 exon 5 of 10) [5] 65.32 15.08 0.74 Intergenic [6] 55.00 8.04 Intergenic geneChr geneStart geneEnd geneLength geneStrand geneId [1] chr10 105127724 105148822 21099 + 6877 [2] chr10 42827314 42863493 36180 - 441666 [3] chr6 161551057 161695107 144051 - 56895 [4] chr4 2249160 2263739 14580 - 10608 [5] chr13 51796470 51858377 61908 + 220108 [6] chr19 58740070 58775008 34939 + 27300 distanceToTSS ENSEMBL SYMBOL [1] -10256 ENSG00000148835 TAF5 [2] 218110 LOC441666 [3] 493635 ENSG00000026652 AGPAT4 [4] 43883 ENSG00000123933 MXD4 [5] -4662 ENSG00000150510 FAM124A [6] -8392 ENSG00000198131 ZNF544 GENENAME [1] TAF5 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 100kDa [2] zinc finger protein 91 pseudogene [3] 1-acylglycerol-3-phosphate O-acyltransferase 4 [4] MAX dimerization protein 4 [5] family with sequence similarity 124A [6] zinc finger protein 544 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX NA NA NA NA NA NA ... NA NA NA NA NA NA >

annotatePeak function can accept peak file or GRanges object that contained peak information.
The sample peakfile shown above is a sample output from MACS. All the information contained in peakfile will be kept in the output of annotatePeak.

The annotation column annotates genomic features of the peak, that is whether peak is located in Promoter, Exon, UTR, Intron or Intergenic. If the peak is annotated by Exon or Intron, more detail information will be provided. For instance, Exon (38885 exon 3 of 11) indicates that the peak is located at the 3th exon of gene 38885 (entrezgene ID) which contain 11 exons in total.

All the names of column that contains information of nearest gene annotated will start with gene in camelCaps style.

If parameter annoDb="org.Hs.eg.db" is provided, extra columns of ENSEMBL, SYMBOL and GENENAME will be provided in the final output.

For more detail, please refer to the package vignette. Comments or suggestions are welcome.