Articles by nsaunders

R 3.1 -> 3.2 upgrade notes

April 19, 2015 | nsaunders

My machines upgraded from R version 3.1.3 to version 3.2.0 last week, which means that existing code suddenly cannot find packages and so fails. Some notes to myself, possibly useful to others, for what to do when this happens. Relevant to Ubuntu-based systems (I use Linux Mint). 1. Update packages 1.1. rJava issues My ... [Read more...]

Project Tycho, ggplot2 and the shameless stealing of blog ideas

April 14, 2015 | nsaunders

Last week, Mick Watson posted a terrific article on using R to recreate the visualizations in this WSJ article on the impact of vaccination. Someone beat me to the obvious joke. @BioMickWatson @pathogenomenick Nice quilt plot. — Ed Yong (@edyong209) April 9, 2015 Someone also beat me to the standard response whenever base ...
[Read more...]

Configuring the R BatchJobs package for Torque batch queues

March 31, 2015 | nsaunders

I was asked recently to look at some R code which performs “embarrassingly parallel” computations (the same function, multiple times, different parameters) and see whether I could modify it to run on one of our high-performance computing clusters. The machine has 63 virtual compute nodes and uses the TORQUE batch queue ... [Read more...]

PubMed retraction reporting update

March 23, 2015 | nsaunders

Just a quick update to the previous post. At the helpful suggestion of Steve Royle, I’ve added a new section to the report which attempts to normalise retractions by journal. So for example, J. Biol. Chem. has (as of now) 94 retracted articles and in total 170 842 publications indexed in PubMed. ... [Read more...]

Just how many retracted articles are there in PubMed anyway?

March 19, 2015 | nsaunders

I am forever returning to PubMed data, downloaded as XML, trying to extract information from it and becoming deeply confused in the process. Take the seemingly-simple question “how many retracted articles are there in PubMed?” Well, one way is to search for records with the publication type “Retracted Article”. As ... [Read more...]

Counting things is hard for a given value of “things”

December 1, 2014 | nsaunders

This post is just a summary of some interesting online discussion from last week around open access publishing. I learned a few things about definitions and PubMed/PMC filters. It all begins with an opinion piece, “Open access is tiring out peer reviewers.” With a title like that you might ...
[Read more...]

PubMed Publication Date: what is it, exactly?

September 23, 2014 | nsaunders

File this one under “has troubled me (and others) for some years now, let’s try to resolve it.” Let’s use the excellent R/rentrez package to search PubMed for articles that were retracted in 2013. 117 articles. Now let’s fetch the records in XML format. Next question: which XML ... [Read more...]

Ebola, Wikipedia and data janitors

September 21, 2014 | nsaunders

Sometimes, several strands of thought come together in one place. For me right now, it’s the Wikipedia page “Ebola virus epidemic in West Africa”, which got me thinking about the perennial topic of “data wrangling”, how best to provide public data and why I can’t shake my irritation ... [Read more...]

Venn figures go wrong

August 12, 2014 | nsaunders

I thought nothing could top the classic “6-way Venn banana”, featured in The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. That is until I saw Figure 3 from Compact genome of the Antarctic midge is likely an adaptation to an extreme environment. What’s odd is that Figure 2 ... [Read more...]

When life gives you coloured cells, make categories

August 5, 2014 | nsaunders

Let’s start by making one thing clear. Using coloured cells in Excel to encode different categories of data is wrong. Next time colleagues explain excitedly how “green equals normal and red = tumour”, you must explain that (1) they have sinned and (2) what they meant to do was add a column ... [Read more...]

This is why code written by scientists gets ugly

May 13, 2014 | nsaunders

There’s a lot of discussion around why code written by self-taught “scientist programmers” rarely follows what a trained computer scientist would consider “best practice”. Here’s a recent post on the topic. One answer: we begin with exploratory data analysis and never get around to cleaning it up. An ... [Read more...]

A minor update to my “apply functions” post

February 27, 2014 | nsaunders

One of my more popular posts is A brief introduction to “apply” in R. Come August, it will be four years old. Technology moves on, old blog posts do not. So: thanks to BioStar user zx8754 for pointing me to this Stack Overflow post, in which someone complains that the ... [Read more...]

Box plots. Like box plots, only…box plots.

February 2, 2014 | nsaunders

On a rare, brief holiday (here and here, if you’re interested; both highly-recommended), I make the mistake of checking my Twitter feed: paging @neilfws . . . RT @psudmant: Ground breaking new methods from @naturemethods – boxplots – no rly nature.com/nmeth/journal/…— Chris Miller (@chrisamiller) January 30, 2014 This points me to BoxPlotR. It ... [Read more...]

BLATting the internet: the most frequent gene?

January 23, 2014 | nsaunders

I enjoyed this story from the OpenHelix blog today, describing a Microsoft Research project to mine DNA sequences from web pages and map them to UCSC genome builds. Laura DeMare asks: what was the most-hit gene? Most hit gene? APOE? MT @GenomeBrowser We BLATed the Internet! DNA sequences from 40 billion ... [Read more...]

Quilt plots. Like heat maps, only…heat maps

January 15, 2014 | nsaunders

Stephen tweets: Quilt Plots: A Simple Tool for the #Visualisation of Large Epidemiological Data buff.ly/1doSx4X— Stephen Rudd (@SAGRudd) January 15, 2014 Quilt plots. Sounds interesting. The link points to a short article in PLoS ONE, containing a table and a figure. Here is Figure 1. If you looked at that ... [Read more...]

R: how not to use savehistory() and source()

December 2, 2013 | nsaunders

Admitting to stupidity is part of the learning process. So in the interests of public education, here’s something stupid that I did today. You’re working in the R console. Happy with your exploratory code, you decide to save it to a file. Then, you type something else, for ... [Read more...]
1 2 3 4 5 6 7

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)