Back in 2010, I wrote a web application called PMRetract to monitor retraction notices in the PubMed database. It was written primarily as a way for me to explore some technologies: the Ruby web framework Sinatra, MongoDB (hosted at MongoHQ, now Compose) and Heroku, where the app was hosted.
I automated the update process using Rake and the whole thing ran pretty smoothly, in a “set and forget” kind of way for four years or so. However, the first era of PMRetract is over. Heroku have shut down git pushes to their “Bamboo Stack” – which runs applications using Ruby version 1.8.7 – and will shut down the stack on June 16 2015. Currently, I don’t have the time either to update my code for a newer Ruby version or to figure out the (frankly, near-unintelligible) instructions for migration to the newer Cedar stack.
So I figured now was a good time to learn some new skills, deal with a few issues and relaunch PMRetract as something easier to maintain and more portable. Here it is. As all the code is “out there” for viewing, I’ll just add few notes here regarding this latest incarnation.
- Writing in RMarkdown has several advantages:
- There are the usual advantages of literate documents – seeing the code together with the results, reproducibility.
- Parsing PubMed XML files directly using R is an easier, more “lightweight” process than storage, retrieval and visualisation via a dedicated database.
- The output is a single HTML file which is easy to distribute or host: for example here at Github and here, published to Rpubs using RStudio. Grab it yourself, use it however you like.
- There are a couple of slow procedures (several minutes) that are better run from separate R scripts than from the RMarkdown document, for debugging purposes. These are (a) downloading PubMed XML and (b) retrieving total articles per year across five decades. Those scripts are here at Github. The RMarkdown document then reads their output.
- Highcharts is still my library of choice. I know the cool kids use D3 but (a) I know Highcharts better and (b) I find the transformation between data and its graphical representation most intuitive in Highcharts. That’s just how my brain works, not a reflection of the other libraries.
- The publishing procedure is not quite so fully-automated as it was using Rake; this shell script is my best attempt so far. However, it’s easy enough to compile and publish the document using RStudio whenever the notification feed updates.
- A couple of enhancements:
- The clunky, confusing zoomable timeline showing retractions on specific dates has been replaced by a non-zoomable version showing retraction counts per year.
- There’s always been some confusion as to whether we’re looking at data for retracted articles or their associated retraction notices – so now both types of data are shown, in separate clearly-labelled and coloured plots.
That’s it, more or less. Enjoy and let me know what you think.
Filed under: programming, R, statistics, web resources Tagged: pmretract, pubmed, retraction, rmarkdown, rpubs