proper use of GOSemSim

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One day, I am looking for R packages that can analyze PPI and after searching, I found the ppiPre package in CRAN.


The function of this package is not impressive, and I already knew some related works, including The authors of this webserver contacted me for the usages of GOSemSim when they developing it.

What makes me curious is that the ppiPre package can calculate GO semantic similarity and supports 20 species exactly like GOSemSim. I opened the source tarball, and surprisingly found that its sources related to semantic similarity calculation are totally copied from GOSemSim.

GOSemSim was firstly released in 2008 Bioconductor 2.4 (at that time, devel version) and published in Bioinformatics in 2010. After compared the sources, I found the sources in ppiPre were copied from GOSemSim version 1.6.8 which released in 2010 Bioconductor 2.6.

The Wang method defined in GOKEGGSims.r file of ppiPre is:

   119	WangMethod 

It is identical to the one I defined in GOSemSim 1.6.8:

   196	ygcWangMethod 

The information content based method in ppiPre:

   495	GetLatestCommonAncestor

also identical to the one in GOSemSim 1.6.8:

   280	`ygcInfoContentMethod` 

Let’s look at some helper functions in ppiPre:

   477	rebuildICdata 

Again, it is identical to GOSemSim 1.6.8:

   390	rebuildICdata 

Let’s look at the internal function TCSSComputeIC in ppiPre:

   410	TCSSComputeIC 

and ygcCompute_Information_Content in GOSemSim 1.6.8:

   326	ygcCompute_Information_Content 

Another helper function GetGOMap in ppiPre:

   308	GetGOMap 

My ygcGetGOMap in GOSemSim 1.6.8:

   100	ygcGetGOMap 

There are many other small helper functions that are identical. ppiPre copy most of the source code of GOSemSim. There is 862 lines in GOKEGGSims.r, in which only the following function is about KEGG that is not related to GOSemSim.

    10	KEGGSim 

This function is only 12 lines, and it calculates the similarity by divide the intersect to the total sum. The other lines in GOKEGGSims.r, more than 800 lines, were totally copied from GOSemSim. Other source files in the ppiPre only has less than 450 lines in sum. About 2/3 of ppiPre were copied from GOSemSim.

The author of ppiPre changed the function names and pretend it is their original works. They just copy and paste and take the credit of months of development of GOSemSim. This is really sucks.

After I found this issue, I add a proper use of GOSemSim statement in its github page:

I am very glad that many people find GOSemSim useful and GOSemSim has been cited by 114 (by google scholar, Aug, 2014).

There are two R packages BiSEp and tRanslatome depend on GOSemSim and three R packages clusterProfiler, DOSE and Rcpi import GOSemSim.

SemDist package copy some of the source code from GOSemSim with acknowledging within source code and document.

ppiPre package copy many source code from GOSemSim without any acknowledgement in souce code or document and did not cited GOSemSim in their publication. This violates the restriction of open source license.

For R developers, if you found functions provided in GOSemSim useful, please depends or imports GOSemSim. If you would like to copy and paste source code, you should acknowledge the source code was copied/derived from GOSemSim authored by Guangchuang Yu [email protected] within source code, add GOSemSim in Suggests field and also includes the following reference in the man files for functions that copied/derived from GOSemSim and cited in vignettes.

  Yu et al. (2010) GOSemSim: an R package for measuring
  semantic similarity among GO terms and gene products
  emph{Bioinformatics} (Oxford, England), 26:7 976--978,
  April 2010. ISSN 1367-4803
  PMID: 20179076

You are welcome to use GOSemSim in the way you like, but please cite it and give it the proper credit. I hope you can understand.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)