Farewell then, PubMed Commons

[This article was first published on R – What You're Doing Is Rather Desperate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

PubMed Commons, the NCBI’s experiment in comments for PubMed articles, has been discontinued. Thoroughly too, with all traces of it expunged from the NCBI website.

Last time I wrote about the service, I concluded “all it needs now is more active users, more comments per user and a real API.” None of those things happened. Result: “NIH has decided that the low level of participation does not warrant continued investment in the project, particularly given the availability of other commenting venues.”

NLM also write that “all comments are archived on our FTP site.” A CSV file is available at this location. So is it good for anything?

The CSV archive contains only 6 fields: CommentId, PubmedId, DateCreated, FirstName, LastName and Content. This is unfortunate, as a lot of information has been lost. For example:

  • user IDs, to disambiguate user names
  • comment up- and down-votes
  • threading, showing which comments were replies to other comments
  • information regarding comment moderation

However, there is still information to be extracted from the file. Here’s a summary document at Github. We can see, for example that:

Comments per year never exceeded the maximum achieved in the first full year of operation (2014) and declined to a minimum in 2017

Comments per month also declined to a minimum in 2017, rarely surpassing 150 and often falling below 100.

We can count comments per article showing that the most-commented, with 33 comments, is: “When is Science Ultimately Unreliable?” You will never know now, from looking at PubMed, that this article was controversial and caused debate.

We can count comments per author showing that the “winner” is Lydia Maniatis, with 248 comments. You will never know now, from looking at PubMed, what inspired her and others or precisely how they interacted.

We can at least analyse the comment text; this simple word cloud highlights the prevalance of human clinical studies in publications that generated debate.

For more data
I re-ran my code a few days before PubMed Commons closed its doors, to generate a richer data file (commons.csv) that you can find here. It contains 7619 comments, which I believe is only 10 less than the NCBI archive. I also re-ran my report one final time and you can see the results here.

It is a shame, in my opinion, that NCBI never fully committed to PubMed Commons, and that this same attitude is apparent in their approach to archiving the data. I guess it was an interesting if flawed experiment.

To leave a comment for the author, please follow the link and comment on their blog: R – What You're Doing Is Rather Desperate.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)