proper use of GOSemSim

November 22, 2014
By

(This article was first published on YGC » R, and kindly contributed to R-bloggers)

One day, I am looking for R packages that can analyze PPI and after searching, I found the ppiPre package in CRAN.

2014-11-22-215959_895x256_scrot

The function of this package is not impressive, and I already knew some related works, including http://intscore.molgen.mpg.de/. The authors of this webserver contacted me for the usages of GOSemSim when they developing it.

What makes me curious is that the ppiPre package can calculate GO semantic similarity and supports 20 species exactly like GOSemSim. I opened the source tarball, and surprisingly found that its sources related to semantic similarity calculation are totally copied from GOSemSim.

GOSemSim was firstly released in 2008 Bioconductor 2.4 (at that time, devel version) and published in Bioinformatics in 2010. After compared the sources, I found the sources in ppiPre were copied from GOSemSim version 1.6.8 which released in 2010 Bioconductor 2.6.

The Wang method defined in GOKEGGSims.r file of ppiPre is:

   119	WangMethod <- function(GOID1, GOID2, ont="MF", organism="yeast") {
   120		if(!exists("ppiPreEnv")) .initial()
   121		weight.isa = 0.8
   122		weight.partof = 0.6
   123
   124		if (GOID1 == GOID2)
   125			return (1)
   126
   127		Parents.name <- switch(ont,
   128			MF = "MFParents",
   129			BP = "BPParents",
   130			CC = "CCParents"
   131		)
   132		if (!exists(Parents.name, envir=ppiPreEnv)) {
   133			GetGOParents(ont)
   134		}
   135		Parents <- get(Parents.name, envir=ppiPreEnv)
   136
   137		sv.a <- 1
   138		sv.b <- 1
   139		sw <- 1
   140		names(sv.a) <- GOID1
   141		names(sv.b) <- GOID2
   142
   143		sv.a <- WangSemVal(GOID1, ont, Parents, sv.a, sw, weight.isa, weight.partof)
   144		sv.b <- WangSemVal(GOID2, ont, Parents, sv.b, sw, weight.isa, weight.partof)
   145
   146		sv.a <- uniqsv(sv.a)
   147		sv.b <- uniqsv(sv.b)
   148
   149		idx <- intersect(names(sv.a), names(sv.b))
   150		inter.sva <- unlist(sv.a[idx])
   151		inter.svb <- unlist(sv.b[idx])
   152		sim <- sum(inter.sva,inter.svb) / sum(sv.a, sv.b)
   153		return(sim)
   154	}
   155	WangSemVal <- function(goid, ont, Parents, sv, w, weight.isa, weight.partof) {
   156		if(!exists("ppiPreCache"))
   157			return(WangSemVal_internal(goid, ont, Parents, sv, w, weight.isa, weight.partof))
   158		goid.ont <- paste(goid, ont, sep=".")
   159		if (!exists(goid.ont, envir=ppiPreCache)) {
   160		  	value <- WangSemVal_internal(goid, ont, Parents, sv, w, weight.isa, weight.partof)
   161		  	assign(eval(goid.ont), value, envir=ppiPreCache)
   162		}
   163		return(get(goid.ont, envir=ppiPreCache))
   164	}
   165
   166	WangSemVal_internal <- function(goid, ont, Parents, sv, w, weight.isa, weight.partof) {
   167		p <- Parents[goid]
   168		p <- unlist(p[[1]])
   169		if (length(p) == 0) {
   170			return(0)
   171		}
   172		relations <- names(p)
   173		old.w <- w
   174		for (i in 1:length(p)) {
   175			if (relations[i] == "is_a") {
   176				w <- old.w * weight.isa
   177			} else {
   178				w <- old.w * weight.partof
   179			}
   180			names(w) <- p[i]
   181			sv <- c(sv,w)
   182			if (p[i] != "all") {
   183				sv <- WangSemVal_internal(p[i], ont, Parents, sv, w, weight.isa, weight.partof)
   184			}
   185		}
   186		return (sv)
   187	}
   188
   189	uniqsv <- function(sv) {
   190		sv <- unlist(sv)
   191		una <- unique(names(sv))
   192		sv <- unlist(sapply(una, function(x) {max(sv[names(sv)==x])}))
   193		return (sv)
   194	}

It is identical to the one I defined in GOSemSim 1.6.8:

   196	ygcWangMethod <- function(GOID1, GOID2, ont="MF", organism="human") {
   197		if(!exists("GOSemSimEnv")) .initial()
   198		weight.isa = 0.8
   199		weight.partof = 0.6
   200
   201		if (GOID1 == GOID2)
   202			return (gosim=1)
   203
   204		Parents.name <- switch(ont,
   205			MF = "MFParents",
   206			BP = "BPParents",
   207			CC = "CCParents"
   208		)
   209		if (!exists(Parents.name, envir=GOSemSimEnv)) {
   210			ygcGetParents(ont)
   211		}
   212		Parents <- get(Parents.name, envir=GOSemSimEnv)
   213
   214		sv.a <- 1
   215		sv.b <- 1
   216		sw <- 1
   217		names(sv.a) <- GOID1
   218		names(sv.b) <- GOID2
   219
   220		sv.a <- ygcSemVal(GOID1, ont, Parents, sv.a, sw, weight.isa, weight.partof)
   221		sv.b <- ygcSemVal(GOID2, ont, Parents, sv.b, sw, weight.isa, weight.partof)
   222
   223		sv.a <- uniqsv(sv.a)
   224		sv.b <- uniqsv(sv.b)
   225
   226		idx <- intersect(names(sv.a), names(sv.b))
   227		inter.sva <- unlist(sv.a[idx])
   228		inter.svb <- unlist(sv.b[idx])
   229		sim <- sum(inter.sva,inter.svb) / sum(sv.a, sv.b)
   230		return(sim)
   231	}
   232
   233
   234
   235	uniqsv <- function(sv) {
   236		sv <- unlist(sv)
   237		una <- unique(names(sv))
   238		sv <- unlist(sapply(una, function(x) {max(sv[names(sv)==x])}))
   239		return (sv)
   240	}
   241
   242	ygcSemVal_internal <- function(goid, ont, Parents, sv, w, weight.isa, weight.partof) {
   243		p <- Parents[goid]
   244		p <- unlist(p[[1]])
   245		if (length(p) == 0) {
   246			#warning(goid, " may not belong to Ontology ", ont)
   247			return(0)
   248		}
   249		relations <- names(p)
   250		old.w <- w
   251		for (i in 1:length(p)) {
   252			if (relations[i] == "is_a") {
   253				w <- old.w * weight.isa
   254			} else {
   255				w <- old.w * weight.partof
   256			}
   257			names(w) <- p[i]
   258			sv <- c(sv,w)
   259			if (p[i] != "all") {
   260				sv <- ygcSemVal_internal(p[i], ont, Parents, sv, w, weight.isa, weight.partof)
   261			}
   262		}
   263		return (sv)
   264	}
   265
   266	ygcSemVal <- function(goid, ont, Parents, sv, w, weight.isa, weight.partof) {
   267		if(!exists("GOSemSimCache")) return(ygcSemVal_internal(goid, ont, Parents, sv, w, weight.isa, weight.partof))
   268		goid.ont <- paste(goid, ont, sep=".")
   269		if (!exists(goid.ont, envir=GOSemSimCache)) {
   270		  	value <- ygcSemVal_internal(goid, ont, Parents, sv, w, weight.isa, weight.partof)
   271		  	assign(goid.ont, value, envir=GOSemSimCache)
   272			#cat("recompute ", goid, value, "n")
   273		}
   274		else{
   275			#cat("cache ", goid, get(goid, envir=GOSemSimCache), "n")
   276		}
   277		return(get(goid.ont, envir=GOSemSimCache))
   278	}

The information content based method in ppiPre:

   495	GetLatestCommonAncestor<-function(GOID1, GOID2, ont, organism){
   496	#message("Calulating Latest Common Ancestor...")
   497		if(!exists("ppiPreEnv")) .initial()
   498
   499		fname <- paste("Info_Contents", ont, organism, sep="_")
   500		tryCatch(utils::data(list=fname, package="ppiPre", envir=ppiPreEnv))
   501		InfoContents <- get("IC", envir=ppiPreEnv)
   502
   503		rootCount <- max(InfoContents[InfoContents != Inf])
   504		InfoContents["all"] = 0
   505		p1 <- InfoContents[GOID1]/rootCount
   506		p2 <- InfoContents[GOID2]/rootCount
   507		if(is.na(p1) || is.na(p2)) return (NA)
   508		if (p1 == 0 || p2 == 0) return (NA)
   509		Ancestor.name <- switch(ont,MF = "MFAncestors",BP = "BPAncestors",CC = "CCAncestors")
   510		if (!exists(Ancestor.name, envir=ppiPreEnv)) {
   511			TCSSGetAncestors(ont)
   512		}
   513
   514		Ancestor <- get(Ancestor.name, envir=ppiPreEnv)
   515		ancestor1 <- unlist(Ancestor[GOID1])
   516		ancestor2 <- unlist(Ancestor[GOID2])
   517		if (GOID1 == GOID2) {
   518			commonAncestor <- GOID1
   519		} else if (GOID1 %in% ancestor2) {
   520			commonAncestor <- GOID1
   521		} else if (GOID2 %in% ancestor1) {
   522			commonAncestor <- GOID2
   523		} else {
   524			commonAncestor <- intersect(ancestor1, ancestor2)
   525		}
   526		if (length(commonAncestor) == 0)
   527			LCA<-NULL
   528		max<- -100
   529		LCA<-NULL
   530		for(a in commonAncestor){
   531			if(!is.na(InfoContents[a])) {
   532		  		if(InfoContents[a]>max){
   533					max<-InfoContents[a]
   534					LCA<-a
   535				}
   536			}
   537		}
   538	#message("done...")
   539		return (LCA)
   540
   541	}

also identical to the one in GOSemSim 1.6.8:

   280	`ygcInfoContentMethod` <- function(GOID1, GOID2, ont, measure, organism) {
   281		if(!exists("GOSemSimEnv")) .initial()
   282		fname <- paste("Info_Contents", ont, organism, sep="_")
   283		tryCatch(utils::data(list=fname, package="GOSemSim", envir=GOSemSimEnv))
   284		Info.contents <- get("IC", envir=GOSemSimEnv)
   285
   286		rootCount <- max(Info.contents[Info.contents != Inf])
   287		Info.contents["all"] = 0
   288		p1 <- Info.contents[GOID1]/rootCount
   289		p2 <- Info.contents[GOID2]/rootCount
   290
   291		if (p1 == 0 || p2 == 0) return (NA)
   292		Ancestor.name <- switch(ont,
   293			MF = "MFAncestors",
   294			BP = "BPAncestors",
   295			CC = "CCAncestors"
   296		)
   297		if (!exists(Ancestor.name, envir=GOSemSimEnv)) {
   298			ygcGetAncestors(ont)
   299		}
   300
   301		Ancestor <- get(Ancestor.name, envir=GOSemSimEnv)
   302		ancestor1 <- unlist(Ancestor[GOID1])
   303		ancestor2 <- unlist(Ancestor[GOID2])
   304		if (GOID1 == GOID2) {
   305			commonAncestor <- GOID1
   306		} else if (GOID1 %in% ancestor2) {
   307			commonAncestor <- GOID1
   308		} else if (GOID2 %in% ancestor1) {
   309			commonAncestor <- GOID2
   310		} else {
   311			commonAncestor <- intersect(ancestor1, ancestor2)
   312		}
   313		if (length(commonAncestor) == 0) return (NA)
   314		pms <- max(Info.contents[commonAncestor], na.rm=TRUE)/rootCount
   315		sim<-switch(measure,
   316	   	    Resnik = pms,
   317	   	    Lin = pms/(p1+p2),
   318	   	    Jiang = 1 - min(1, -2*pms + p1 + p2),
   319	   	    Rel = 2*pms/(p1+p2)*(1-exp(-pms*rootCount))
   320		)
   321		return (sim)
   322	}

Let's look at some helper functions in ppiPre:

   477	rebuildICdata <- function(){
   478		ont <- c("MF","CC", "BP")
   479		species <- c("human", "rat", "mouse", "fly", "yeast", "zebrafish", "arabidopsis","worm", "ecolik12", "bovine","canine","anopheles","ecsakai","chicken","chimp","malaria","rhesus","pig","xenopus","coelicolor")
   480		cat("------------------------------------n")
   481		cat("calulating Information Content...nSpecies:ttOntologyn")
   482		for (i in ont) {
   483			for (j in species) {
   484				cat(j)
   485				cat("ttt")
   486				cat(i)
   487				cat("n")
   488				TCSSComputeIC(ont=i, organism=j)
   489			}
   490		}
   491		cat("------------------------------------n")
   492		message("done...")
   493	}

Again, it is identical to GOSemSim 1.6.8:

   390	rebuildICdata <- function(){
   391		ont <- c("MF","CC", "BP")
   392		species <- c("human", "rat", "mouse", "fly", "yeast", "zebrafish", "arabidopsis","worm", "ecolik12", "bovine","canine","anopheles","ecsakai","chicken","chimp","malaria","rhesus","pig","xenopus")
   393		cat("------------------------------------n")
   394		cat("calulating Information Content...nSpecies:ttOntologyn")
   395		for (i in ont) {
   396			for (j in species) {
   397				cat(j)
   398				cat("ttt")
   399				cat(i)
   400				cat("n")
   401				ygcCompute_Information_Content(ont=i, organism=j)
   402			}
   403		}
   404		cat("------------------------------------n")
   405		print("done...")
   406	}

Let's look at the internal function TCSSComputeIC in ppiPre:

   410	TCSSComputeIC <- function(dropCodes="IEA", ont, organism) {
   411	message("Calulating IC...")
   412		wh_ont <- match.arg(ont, c("MF", "BP", "CC"))
   413		wh_organism <- match.arg(organism, c("human", "fly", "mouse", "rat", "yeast", "zebrafish", "worm", "arabidopsis", "ecolik12", "bovine","canine","anopheles","ecsakai","chicken","chimp","malaria","rhesus","pig","xenopus", "coelicolor"))
   414		CheckAnnotationPackage(wh_organism)
   415		gomap <- switch(organism,
   416			human = org.Hs.egGO,
   417			fly = org.Dm.egGO,
   418			mouse = org.Mm.egGO,
   419			rat = org.Rn.egGO,
   420			yeast = org.Sc.sgdGO,
   421			zebrafish = org.Dr.egGO,
   422			worm = org.Ce.egGO,
   423			arabidopsis = org.At.tairGO,
   424			ecoli = org.EcK12.egGO,
   425			bovine	= org.Bt.egGO,
   426			canine	= org.Cf.egGO,
   427			anopheles	=	org.Ag.egGO,
   428			ecsakai	=	org.EcSakai.egGO,
   429			chicken	=	org.Gg.egGO,
   430			chimp	=	org.Pt.egGO,
   431			malaria	=	org.Pf.plasmoGO,
   432			rhesus	=	org.Mmu.egGO,
   433			pig	= org.Ss.egGO,
   434			xenopus	=	org.Xl.egGO,
   435			coelicolor	=	org.Sco.egGO
   436		)
   437
   438		mapped_genes <- mappedkeys(gomap)
   439		gomap = AnnotationDbi::as.list(gomap[mapped_genes])
   440		if (!is.null(dropCodes)){
   441			gomap<-sapply(gomap,function(x) sapply(x,function(y) c(y$Evidence %in% dropCodes, y$Ontology %in% wh_ont)))
   442			gomap<-sapply(gomap, function(x) x[2,x[1,]=="FALSE"])
   443			gomap<-gomap[sapply(gomap,length) >0]
   444		}else {
   445			gomap <- sapply(gomap,function(x) sapply(x,function(y) y$Ontology %in% wh_ont))
   446		}
   447
   448		goterms<-unlist(sapply(gomap, function(x) names(x)), use.names=FALSE) # all GO terms appearing in an annotation
   449		goids <- toTable(GOTERM)
   450
   451		goids <- unique(goids[goids[,"Ontology"] == wh_ont, "go_id"])
   452		gocount <- table(goterms)
   453		goname <- names(gocount) #goid of specific organism and selected category.
   454
   455		go.diff <- setdiff(goids, goname)
   456		m <- double(length(go.diff))
   457		names(m) <- go.diff
   458		gocount <- as.vector(gocount)
   459		names(gocount) <- goname
   460		gocount <- c(gocount, m)
   461
   462		Offsprings.name <- switch(wh_ont,
   463			MF = "MFOffsprings",
   464			BP = "BPOffsprings",
   465			CC = "CCOffsprings"
   466		)
   467		if (!exists(Offsprings.name, envir=ppiPreEnv)) {
   468			TCSSGetOffsprings(wh_ont)
   469		}
   470		Offsprings <- get(Offsprings.name, envir=ppiPreEnv)
   471		cnt <- sapply(goids,function(x){ c=gocount[unlist(Offsprings[x])]; gocount[x]+sum(c[!is.na(c)])})
   472		names(cnt) <- goids
   473		IC<- -log(cnt/sum(gocount))
   474	message("done...")
   475		return (IC)
   476	}

and ygcCompute_Information_Content in GOSemSim 1.6.8:

   326	ygcCompute_Information_Content <- function(dropCodes="NULL", ont, organism) {
   327		wh_ont <- match.arg(ont, c("MF", "BP", "CC"))
   328		wh_organism <- match.arg(organism, c("human", "fly", "mouse", "rat", "yeast", "zebrafish", "worm", "arabidopsis", "ecolik12", "bovine","canine","anopheles","ecsakai","chicken","chimp","malaria","rhesus","pig","xenopus"))
   329		ygcCheckAnnotationPackage(wh_organism)
   330		gomap <- switch(wh_organism,
   331			human = org.Hs.egGO,
   332			fly = org.Dm.egGO,
   333			mouse = org.Mm.egGO,
   334			rat = org.Rn.egGO,
   335			yeast = org.Sc.sgdGO,
   336			zebrafish = org.Dr.egGO,
   337			worm = org.Ce.egGO,
   338			arabidopsis = org.At.tairGO,
   339			ecolik12 = org.EcK12.egGO,
   340			bovine	= org.Bt.egGO,
   341			canine	= org.Cf.egGO,
   342			anopheles	=	org.Ag.egGO,
   343			ecsakai	=	org.EcSakai.egGO,
   344			chicken	=	org.Gg.egGO,
   345			chimp	=	org.Pt.egGO,
   346			malaria	=	org.Pf.plasmoGO,
   347			rhesus	=	org.Mmu.egGO,
   348			pig	= org.Ss.egGO,
   349			xenopus	=	org.Xl.egGO
   350		)
   351		mapped_genes <- mappedkeys(gomap)
   352		gomap = AnnotationDbi::as.list(gomap[mapped_genes])
   353		if (!is.null(dropCodes)){
   354			gomap<-sapply(gomap,function(x) sapply(x,function(y) c(y$Evidence %in% dropCodes, y$Ontology %in% wh_ont)))
   355			gomap<-sapply(gomap, function(x) x[2,x[1,]=="FALSE"])
   356			gomap<-gomap[sapply(gomap,length) >0]
   357		}else {
   358			gomap <- sapply(gomap,function(x) sapply(x,function(y) y$Ontology %in% wh_ont))
   359		}
   360
   361		goterms<-unlist(sapply(gomap, function(x) names(x)), use.names=FALSE) # all GO terms appearing in an annotation
   362		goids <- toTable(GOTERM)
   363		# all go terms which belong to the corresponding category..
   364		goids <- unique(goids[goids[,"Ontology"] == wh_ont, "go_id"])
   365		gocount <- table(goterms)
   366		goname <- names(gocount) #goid of specific organism and selected category.
   367		## ensure goterms not appearing in the specific annotation have 0 frequency..
   368		go.diff <- setdiff(goids, goname)
   369		m <- double(length(go.diff))
   370		names(m) <- go.diff
   371		gocount <- as.vector(gocount)
   372		names(gocount) <- goname
   373		gocount <- c(gocount, m)
   374
   375		Offsprings.name <- switch(wh_ont,
   376			MF = "MFOffsprings",
   377			BP = "BPOffsprings",
   378			CC = "CCOffsprings"
   379		)
   380		if (!exists(Offsprings.name, envir=GOSemSimEnv)) {
   381			ygcGetOffsprings(wh_ont)
   382		}
   383		Offsprings <- get(Offsprings.name, envir=GOSemSimEnv)
   384		cnt <- sapply(goids,function(x){ c=gocount[unlist(Offsprings[x])]; gocount[x]+sum(c[!is.na(c)])})
   385		names(cnt) <- goids
   386		IC<- -log(cnt/sum(gocount))
   387		save(IC, file=paste(paste("Info_Contents", wh_ont, organism, sep="_"), ".rda", sep=""))
   388	}

Another helper function GetGOMap in ppiPre:

   308	GetGOMap <- function(organism="yeast") {
   309		if(!exists("ppiPreEnv")) .initial()
   310		CheckAnnotationPackage(organism) #download and install the packages
   311		species <- switch(organism,
   312			human = "Hs",
   313			fly = "Dm",
   314			mouse = "Mm",
   315			rat = "Rn",
   316			yeast = "Sc",
   317			zebrafish = "Dr",
   318			worm = "Ce",
   319			arabidopsis = "At",
   320			ecolik12 = "EcK12",
   321			bovine	= "Bt",
   322			canine	= "Cf",
   323			anopheles	=	"Ag",
   324			ecsakai	=	"EcSakai",
   325			chicken	=	"Gg",
   326			chimp	=	"Pt",
   327			malaria	=	"Pf",
   328			rhesus	=	"Mmu",
   329			pig	= "Ss",
   330			xenopus	=	"Xl",
   331			coelicolor	=	"Sco"
   332		)
   333
   334		gomap <- switch(organism,
   335			human = org.Hs.egGO,
   336			fly = org.Dm.egGO,
   337			mouse = org.Mm.egGO,
   338			rat = org.Rn.egGO,
   339			yeast = org.Sc.sgdGO,
   340			zebrafish = org.Dr.egGO,
   341			worm = org.Ce.egGO,
   342			arabidopsis = org.At.tairGO,
   343			ecoli = org.EcK12.egGO,
   344			bovine	= org.Bt.egGO,
   345			canine	= org.Cf.egGO,
   346			anopheles	=	org.Ag.egGO,
   347			ecsakai	=	org.EcSakai.egGO,
   348			chicken	=	org.Gg.egGO,
   349			chimp	=	org.Pt.egGO,
   350			malaria	=	org.Pf.plasmoGO,
   351			rhesus	=	org.Mmu.egGO,
   352			pig	= org.Ss.egGO,
   353			xenopus	=	org.Xl.egGO,
   354			coelicolor	=	org.Sco.egGO
   355		)
   356
   357		assign(eval(species), gomap, envir=ppiPreEnv)
   358	}

My ygcGetGOMap in GOSemSim 1.6.8:

   100	ygcGetGOMap <- function(organism="human") {
   101		if(!exists("GOSemSimEnv")) .initial()
   102		ygcCheckAnnotationPackage(organism)
   103		species <- switch(organism,
   104			human = "Hs",
   105			fly = "Dm",
   106			mouse = "Mm",
   107			rat = "Rn",
   108			yeast = "Sc",
   109			zebrafish = "Dr",
   110			worm = "Ce",
   111			arabidopsis = "At",
   112			ecolik12 = "EcK12",
   113			bovine	= "Bt",
   114			canine	= "Cf",
   115			anopheles	=	"Ag",
   116			ecsakai	=	"EcSakai",
   117			chicken	=	"Gg",
   118			chimp	=	"Pt",
   119			malaria	=	"Pf",
   120			rhesus	=	"Mmu",
   121			pig	= "Ss",
   122			xenopus	=	"Xl"
   123		)
   124		gomap <- switch(organism,
   125			human = org.Hs.egGO,
   126			fly = org.Dm.egGO,
   127			mouse = org.Mm.egGO,
   128			rat = org.Rn.egGO,
   129			yeast = org.Sc.sgdGO,
   130			zebrafish = org.Dr.egGO,
   131			worm = org.Ce.egGO,
   132			arabidopsis = org.At.tairGO,
   133			ecolik12 = org.EcK12.egGO,
   134			bovine	= org.Bt.egGO,
   135			canine	= org.Cf.egGO,
   136			anopheles	=	org.Ag.egGO,
   137			ecsakai	=	org.EcSakai.egGO,
   138			chicken	=	org.Gg.egGO,
   139			chimp	=	org.Pt.egGO,
   140			malaria	=	org.Pf.plasmoGO,
   141			rhesus	=	org.Mmu.egGO,
   142			pig	= org.Ss.egGO,
   143			xenopus	=	org.Xl.egGO
   144		)
   145		assign(eval(species), gomap, envir=GOSemSimEnv)
   146	}

There are many other small helper functions that are identical. ppiPre copy most of the source code of GOSemSim. There is 862 lines in GOKEGGSims.r, in which only the following function is about KEGG that is not related to GOSemSim.

    10	KEGGSim <- function(protein1, protein2)    # KEGG-based similarity of two proteins
    11	{
    12
    13	  if(!require("KEGG.db")){ stop("package KEGG.db is needed.")}
    14	  Pathway1 <- KEGG.db::KEGGEXTID2PATHID[[protein1]]
    15		Pathway2 <- KEGG.db::KEGGEXTID2PATHID[[protein2]]
    16		intersec <- length(na.omit(match(Pathway1, Pathway2)))
    17		if(intersec==0)
    18			sim<-0
    19		else
    20			sim<-intersec/(length(Pathway1)+length(Pathway2)-intersec)
    21		return(sim)
    22	}

This function is only 12 lines, and it calculates the similarity by divide the intersect to the total sum. The other lines in GOKEGGSims.r, more than 800 lines, were totally copied from GOSemSim. Other source files in the ppiPre only has less than 450 lines in sum. About 2/3 of ppiPre were copied from GOSemSim.

The author of ppiPre changed the function names and pretend it is their original works. They just copy and paste and take the credit of months of development of GOSemSim. This is really sucks.

After I found this issue, I add a proper use of GOSemSim statement in its github page:

I am very glad that many people find GOSemSim useful and GOSemSim has been cited by 114 (by google scholar, Aug, 2014).

There are two R packages BiSEp and tRanslatome depend on GOSemSim and three R packages clusterProfiler, DOSE and Rcpi import GOSemSim.

SemDist package copy some of the source code from GOSemSim with acknowledging within source code and document.

ppiPre package copy many source code from GOSemSim without any acknowledgement in souce code or document and did not cited GOSemSim in their publication. This violates the restriction of open source license.

For R developers, if you found functions provided in GOSemSim useful, please depends or imports GOSemSim. If you would like to copy and paste source code, you should acknowledge the source code was copied/derived from GOSemSim authored by Guangchuang Yu [email protected] within source code, add GOSemSim in Suggests field and also includes the following reference in the man files for functions that copied/derived from GOSemSim and cited in vignettes.

references{
  Yu et al. (2010) GOSemSim: an R package for measuring
  semantic similarity among GO terms and gene products
  emph{Bioinformatics} (Oxford, England), 26:7 976--978,
  April 2010. ISSN 1367-4803
  url{http://bioinformatics.oxfordjournals.org/cgi/content/abstract/26/7/976}
  PMID: 20179076
}

You are welcome to use GOSemSim in the way you like, but please cite it and give it the proper credit. I hope you can understand.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)