use simplify to remove redundancy of enriched GO terms

October 21, 2015

(This article was first published on YGC » R, and kindly contributed to R-bloggers)

To simplify enriched GO result, we can use slim version of GO and use enricher function to analyze.

Another strategy is to use GOSemSim to calculate similarity of GO terms and remove those highly similar terms by keeping one representative term. To make this feature available to clusterProfiler users, I develop a simplify method to reduce redundant GO terms from output of enrichGO function.

?View Code RSPLUS

data(geneList, package="DOSE")
de <- names(geneList)[abs(geneList) > 2]
bp <- enrichGO(de, ont="BP")

The enrichMap doesn’t display the whole picture as we use the default value n=50 to only show 50 highly significant terms. In the enrichMap, we can found that there are many redundant terms form a highly condense network.

Now with the simplify method, we can remove redundant terms.

?View Code RSPLUS

bp2 <- simplify(bp, cutoff=0.7, by="p.adjust", select_fun=min)

The simplify method apply ‘select_fun’ (which can be a user defined function) to feature ‘by‘ to select one representative terms from redundant terms (which have similarity higher than ‘cutoff‘).

The simplified version of enriched result is more clear and give us a more comprehensive view of the whole story.

enrichGO test the whole GO corpus and enriched result may contains very general terms. clusterProfiler contains a dropGO function to remove specific GO terms or GO level, see the issue. With simplify and dropGO, enriched result can be more specific and more easy to interpret. Both of these functions work fine with outputs obtained from both enrichGO and compareCluster.🍻

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)