functional enrichment for GTEx paper

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The ENCODE consortium has recently published a great paper on Gene Expression from the GTEx dataset. A criticism raised on pubpeer is that the gene ontology enrichment analysis was done with DAVID which has not been updated in the last five years.

The result is shown below:

It would be interest to see if results change by using clusterProfiler to reproduce GO & KEGG analyses.

For GO (BP) analysis:
1. clusterProfiler (194) annotate more genes than DAVID (168).
2. DAVID enriches 14 BP terms while clusterProfiler enriches 222 BP terms.
3. All enriched terms reported in DAVID were also reported by clusterProfiler.
4. The result is consistent that the enriched terms are related to translation and protein biosynthesis with many detail and informative terms reported only by clusterProfiler.

For KEGG analysis:
1. DAVID annotates 5085 genes in background while clusterProfiler use latest online version that annotates 6895 genes.
2. DAVID only annotate 83 genes, while clusterProfiler can annotate 104 genes of the gene list with 212 genes in total.
3. DAVID only enrich 1 KEGG terms, while clusterProfiler enriches 9 KEGG terms.
4. All enriched terms reported in DAVID were also reported by clusterProfiler.

It would also be interest to see whether these genes are related to specific diseases/pathways and I run DOSE/ReactomePA to analyze them.

I found that cancer and disease of cellular proliferation terms enriched.

With ReactomePA, I found 3 cluster of pathways. Two are consistent with GO/KEGG/DO enrichment result that is translation related with clear evidence of high splicing variability. Another cluster of pathways related to immune response which is not discovered by other ontology/pathway. These immune-related pathways did not reported in the paper but apparently immune responses are related to high splicing variability.

For more detail please refer to github repository, https://github.com/GuangchuangYu/enrichment4GTEx_clusterProfiler, which contains rmarkdown source file to reproduce the results. You can try other annotation data and also GSEA to explore GTEx.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)