ChIPseq data mining with ChIPseeker

October 1, 2015
By

(This article was first published on YGC » R, and kindly contributed to R-bloggers)

ChIP-seq is rapidly becoming a common technique and there are a large number of dataset available in the public domain. Results from individual experiments provide a limited understanding of chromatin interactions, as there is many factors cooperate to regulate transcription. Unlike other tools that designed for single dataset, ChIPseeker is designed for comparing profiles of ChIP-seq datasets at different levels.

We provide functions to compare profiles of peaks binding to TSS regions, annotation, and enriched functional profiles. More importantly, ChIPseeker incorporates statistical testing of co-occurrence of different ChIP-seq datasets and can be used to identify co-factors.

?View Code RSPLUS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
> library(ChIPseeker)
> ff=getSampleFiles()
> x = enrichPeakOverlap(ff[[5]], unlist(ff[1:4]), nShuffle=10000, pAdjustMethod="BH", chainFile=NULL)
>> permutation test of peak overlap...		 2015-09-24 14:23:43
  |======================================================================| 100%
> x
                                                      qSample
ARmo_0M    GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz
ARmo_1nM   GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz
ARmo_100nM GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz
CBX6_BF    GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz
                                                      tSample qLen tLen N_OL
ARmo_0M                       GSM1174480_ARmo_0M_peaks.bed.gz 1663  812    0
ARmo_1nM                     GSM1174481_ARmo_1nM_peaks.bed.gz 1663 2296    8
ARmo_100nM                 GSM1174482_ARmo_100nM_peaks.bed.gz 1663 1359    3
CBX6_BF    GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz 1663 1331  968
               pvalue   p.adjust
ARmo_0M    0.88901110 0.88901110
ARmo_1nM   0.15118488 0.30236976
ARmo_100nM 0.37296270 0.49728360
CBX6_BF    0.00009999 0.00039996


The enrichment analysis of peak overlap is based on permutation test. nShuffle of random ChIP data were generated to estimate the background null distribution of the overlap and p-value is then calculated by the probability of observing more extreme overlap. Multiple comparison correction is also incorporated.

The most exciting feature in ChIPseeker is that it collected more than 18,000 bed file information from GEO database and make this co-factor inference available to the community. With these datasets, we can compare our own dataset to those deposited in GEO to identify co-occurrence binding proteins that maybe cooperated with the one we are interested in. Hypothesis can be generated by this inference and serve as a starting point for further study.

Reference

G Yu, LG Wang, QY He. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics. 2015, 31(14):2382-2383. PMID:25765347.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)