Counting things is hard for a given value of “things”

[This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post is just a summary of some interesting online discussion from last week around open access publishing. I learned a few things about definitions and PubMed/PMC filters.

It all begins with an opinion piece, “Open access is tiring out peer reviewers.” With a title like that you might expect rebuttals from people like Michael Eisen and you’d be right.

In his post, Michael presents a table showing numbers for total and open access (OA) publications from 2000-2013. Initially I thought his OA numbers were rather low, but it turns out that there is a very strict definition of what constitutes OA: membership of the PMC OA subset.

I still don’t agree entirely with Michael’s numbers though; for example his PMC OA count for the year 2000 is 3 438, whereas mine is:

library(rentrez)
es <- entrez_search("pmc", "open access[FILT] AND 2000[PDAT]")
es$count
[1] 4827

but we’re in the same ballpark, at least. Can I respectfully suggest that when writing blog posts which use numbers to support an argument, it’s important and useful to show exactly how those numbers were derived.

For comparison, and mainly because I wanted an excuse to use RPubs for the first time, here are a couple of documents that I created. The first one looks at the increase in PubMed articles marked as “free full text” as compared with total PubMed articles:

es <- entrez_search("pubmed", "freetext[FILT] AND 2000[PDAT]")
es$count
[1] 109326

pmc

Growth of PMC OA subset 2000-2013

“Free text available” is not the same as “open access”. That hasn’t stopped others from using it as a proxy for OA and I think it’s worth examining. A major argument for OA is that publicly-funded research should be accessible to the public; if “free text available” achieves this then surely that is A Good Thing, regardless of whether it is “truly OA”. There are two messages from the OA movement, “accessibility” and “reusability” and to be frank, I think there are times when those messages become confused, mixed or lost inside technical, rather zealous arguments.

My second document compares the growth of the PMC OA subset with all PMC articles. I’d argue that this is a more “like with like” comparison than PMC-OA to PubMed, although I can see the value of PubMed as a proxy for “all biomedical articles.”

To summarise, my documents contain broadly the same message as Michael’s: namely that whilst the proportion of OA (or if you like, freely-available) articles is rising, there is no rapid “year on year” inflationary increase that could be interpreted as driving the overall growth in literature. My additional message is that when presenting tables of numbers, it’s nice to make them reproducible 🙂


Filed under: open access, R, statistics Tagged: peer review, publishing, rpubs

To leave a comment for the author, please follow the link and comment on their blog: What You're Doing Is Rather Desperate » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)