ClusterProfiles

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It is very common to cluster genes based on their expression profiles, and also very common to integrate Gene Ontology to observe the distribution of biological processes, molecular functions and cellular components for a given gene list. But, what if the two in combination? The Gene Ontology distributions across a variety of gene clusters may give us a new insight to find out which specific GO terms may related to our biological problem.

I was inspired by the MCP paper which was also mentioned in my previous post, and developed an R function to get this job done.

Here comes the codes:

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ClusterProfiles <- function(geneClusters, onto="CC", level=3, orgPackage="org.Hs.eg.db") {
	require(goProfiles)
	require(plyr)
	require(ggplot2)
	clusterProfile <- llply(geneClusters, as.data.frame(basicProfile), onto=onto, level=level, orgPackage = orgPackage)
	clusterProfile.df <- ldply(clusterProfile, rbind)
	colnames(clusterProfile.df) <- c("Cluster", "Description", "GOID", "Frequency")
	clusterProfile.df <- clusterProfile.df[clusterProfile.df$Frequency !=0,]
	clusterProfile.df$Description <- as.character(clusterProfile.df$Description) ## un-factor
	clusterProfile.df <- ddply(clusterProfile.df, .(Description), transform, Percent = Frequency/sum(Frequency), Total = sum(Frequency))
 
	x <- mdply(clusterProfile.df[, c("Description", "Total")], paste, sep=" (")
	y <- sapply(x[,3], paste, ")", sep="")
	clusterProfile.df$Description <- y		### label GO Description with gene counts.
	clusterProfile.df <-  clusterProfile.df[, -6] ###drop the *Total* column##
	mtitle <- paste(onto, "Ontology Distribution", sep = " ") 
	p <- ggplot(clusterProfile.df, aes(x = Cluster, y = Description, size = Percent))
        p <- p + geom_point(colour="steelblue") + opts(title = mtitle) + xlab("") + ylab("")
	p <- p + opts(axis.text.x = theme_text(colour="black", size="11", vjust = 1)) 
        p <- p + opts(axis.text.y = theme_text(colour="black", size="11", hjust = 1))
	result <- list(data=clusterProfile.df, p=p)
	return(result)
}


The input *geneClusters* is a list of clusters which contain gene IDs.
Other parameters can refer to the reference manual of Bioconductor package goProfiles.

I post an example below to illustrate how to use it.

> names(geneClusters)
[1] "A" "B" "C" "D"
> geneClusters[1]
$A
 [1]  3838 29766 51070   483 56667  5573  8971 10755   389  8531 55905  3024  7169  1595   387

> clust.go.prof = ClusterProfiles(geneClusters)
> head(clust.go.prof$data)
  Cluster             Description       GOID Frequency   Percent
1       B apical part of cell (4) GO:0045177         1 0.2500000
2       C apical part of cell (4) GO:0045177         1 0.2500000
3       D apical part of cell (4) GO:0045177         2 0.5000000
4       C           cell body (3) GO:0044297         2 0.6666667
5       D           cell body (3) GO:0044297         1 0.3333333
6       C  cell division site (1) GO:0032153         1 1.0000000
> clust.go.prof$p

The function return a list which contain the cluster profiles annotated with GO which may be useful for further analysis, and a graph which can plot the GO distributions as shown below.

The dot sizes was based on the percentage of frequency in each row.

I plan to develop the version 2 of this function to let user specify which GO categories they are interested in looking at.

Any comments or suggestions are welcomed.

p.s: My thanks goes to Tal Galili for inviting me to joint R-bloggers. I am very happy to see my post appeared in R-bloggers.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)