The ENCODE consortium has recently published a great paper on Gene Expression from the GTEx dataset. A criticism raised on pubpeer is that the gene ontology enrichment analysis was done with DAVID which has not been updated in the last five years.
The result is shown below:
It would be interest to see if results change by using clusterProfiler to reproduce GO & KEGG analyses.
For GO (BP) analysis:
1. clusterProfiler (194) annotate more genes than DAVID (168).
2. DAVID enriches 14 BP terms while clusterProfiler enriches 222 BP terms.
3. All enriched terms reported in DAVID were also reported by clusterProfiler.
4. The result is consistent that the enriched terms are related to translation and protein biosynthesis with many detail and informative terms reported only by clusterProfiler.
For KEGG analysis:
1. DAVID annotates 5085 genes in background while clusterProfiler use latest online version that annotates 6895 genes.
2. DAVID only annotate 83 genes, while clusterProfiler can annotate 104 genes of the gene list with 212 genes in total.
3. DAVID only enrich 1 KEGG terms, while clusterProfiler enriches 9 KEGG terms.
4. All enriched terms reported in DAVID were also reported by clusterProfiler.
I found that cancer and disease of cellular proliferation terms enriched.
With ReactomePA, I found 3 cluster of pathways. Two are consistent with GO/KEGG/DO enrichment result that is translation related with clear evidence of high splicing variability. Another cluster of pathways related to immune response which is not discovered by other ontology/pathway. These immune-related pathways did not reported in the paper but apparently immune responses are related to high splicing variability.
For more detail please refer to github repository, https://github.com/GuangchuangYu/enrichment4GTEx_clusterProfiler, which contains rmarkdown source file to reproduce the results. You can try other annotation data and also GSEA to explore GTEx.