Articles by nsaunders

Counting things is hard for a given value of “things”

December 1, 2014 | nsaunders

This post is just a summary of some interesting online discussion from last week around open access publishing. I learned a few things about definitions and PubMed/PMC filters. It all begins with an opinion piece, “Open access is tiring out peer reviewers.” With a title like that you might ...

[Read more...]

Bioinformatics journals: time from submission to acceptance, revisited

October 13, 2014 | nsaunders

Before we start: yes, we’ve been here before. There was the Biostars question “Calculating Time From Submission To Publication / Degree Of Burden In Submitting A Paper.” That gave rise to Pierre’s excellent blog post and code + data on Figshare. So why are we here again? 1. It’s been ...

[Read more...]

PubMed Publication Date: what is it, exactly?

September 23, 2014 | nsaunders

File this one under “has troubled me (and others) for some years now, let’s try to resolve it.” Let’s use the excellent R/rentrez package to search PubMed for articles that were retracted in 2013. 117 articles. Now let’s fetch the records in XML format. Next question: which XML ... [Read more...]

Ebola, Wikipedia and data janitors

September 21, 2014 | nsaunders

Sometimes, several strands of thought come together in one place. For me right now, it’s the Wikipedia page “Ebola virus epidemic in West Africa”, which got me thinking about the perennial topic of “data wrangling”, how best to provide public data and why I can’t shake my irritation ... [Read more...]

Venn figures go wrong

August 12, 2014 | nsaunders

I thought nothing could top the classic “6-way Venn banana”, featured in The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. That is until I saw Figure 3 from Compact genome of the Antarctic midge is likely an adaptation to an extreme environment. What’s odd is that Figure 2 ... [Read more...]

When life gives you coloured cells, make categories

August 5, 2014 | nsaunders

Let’s start by making one thing clear. Using coloured cells in Excel to encode different categories of data is wrong. Next time colleagues explain excitedly how “green equals normal and red = tumour”, you must explain that (1) they have sinned and (2) what they meant to do was add a column ... [Read more...]

Converting a spreadsheet of SMILES: my first OSM contribution

June 30, 2014 | nsaunders

I’ve long admired the work of the Open Source Malaria Project. Unfortunately time and “day job” constraints prevent me from being as involved as I’d like. So: I was happy to make a small contribution recently in response to this request for help: Can anyone help @O_S_... [Read more...]

This is why code written by scientists gets ugly

May 13, 2014 | nsaunders

There’s a lot of discussion around why code written by self-taught “scientist programmers” rarely follows what a trained computer scientist would consider “best practice”. Here’s a recent post on the topic. One answer: we begin with exploratory data analysis and never get around to cleaning it up. An ... [Read more...]

A minor update to my “apply functions” post

February 27, 2014 | nsaunders

One of my more popular posts is A brief introduction to “apply” in R. Come August, it will be four years old. Technology moves on, old blog posts do not. So: thanks to BioStar user zx8754 for pointing me to this Stack Overflow post, in which someone complains that the ... [Read more...]

Box plots. Like box plots, only…box plots.

February 2, 2014 | nsaunders

On a rare, brief holiday (here and here, if you’re interested; both highly-recommended), I make the mistake of checking my Twitter feed: paging @neilfws . . . RT @psudmant: Ground breaking new methods from @naturemethods – boxplots – no rly nature.com/nmeth/journal/…— Chris Miller (@chrisamiller) January 30, 2014 This points me to BoxPlotR. It ... [Read more...]

BLATting the internet: the most frequent gene?

January 23, 2014 | nsaunders

I enjoyed this story from the OpenHelix blog today, describing a Microsoft Research project to mine DNA sequences from web pages and map them to UCSC genome builds. Laura DeMare asks: what was the most-hit gene? Most hit gene? APOE? MT @GenomeBrowser We BLATed the Internet! DNA sequences from 40 billion ... [Read more...]

Quilt plots. Like heat maps, only…heat maps

January 15, 2014 | nsaunders

Stephen tweets: Quilt Plots: A Simple Tool for the #Visualisation of Large Epidemiological Data buff.ly/1doSx4X— Stephen Rudd (@SAGRudd) January 15, 2014 Quilt plots. Sounds interesting. The link points to a short article in PLoS ONE, containing a table and a figure. Here is Figure 1. If you looked at that ... [Read more...]

R: how not to use savehistory() and source()

December 2, 2013 | nsaunders

Admitting to stupidity is part of the learning process. So in the interests of public education, here’s something stupid that I did today. You’re working in the R console. Happy with your exploratory code, you decide to save it to a file. Then, you type something else, for ... [Read more...]

Bacteria and Alzheimer’s disease: I just need to know if ten patients are enough

October 29, 2013 | nsaunders

You can guarantee that when scientists publish a study titled: Determining the Presence of Periodontopathic Virulence Factors in Short-Term Postmortem Alzheimer’s Disease Brain Tissue a newspaper will publish a story titled: Poor dental health and gum disease may cause Alzheimer’s Without access to the paper, it’s difficult ... [Read more...]

Microarrays, scan dates and Bioconductor: it shouldn’t be this difficult

August 21, 2013 | nsaunders

When dealing with data from high-throughput experimental platforms such as microarrays, it’s important to account for potential batch effects. A simple example: if you process all your normal tissue samples this week and your cancerous tissue samples next week, you’re in big trouble. Differences between cancer and normal ... [Read more...]

Interestingly: the sentence adverbs of PubMed Central

July 15, 2013 | nsaunders

Scientific writing – by which I mean journal articles – is a strange business, full of arcane rules and conventions with origins that no-one remembers but to which everyone adheres. I’ve always been amused by one particular convention: the sentence adverb. Used with a comma to make a point at the ... [Read more...]

-omics in 2013

June 24, 2013 | nsaunders

Just how many (bad) -omics are there anyway? Let’s find out. 1. Get the raw data It would be nice if we could search PubMed for titles containing all -omics: However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013: ... [Read more...]

Using the Ensembl Variant Effect Predictor with your 23andme data

June 3, 2013 | nsaunders

I subscribe to the Ensembl blog and found, in my feed reader this morning, a post which linked to the Variant Effect Predictor (VEP). The original blog post, strangely, has disappeared. Not to worry: so, the VEP takes genotyping data in one of several formats, compares it with the Ensembl ... [Read more...]

A brief note: R 3.0.0 and bioinformatics

April 3, 2013 | nsaunders

Today marks the release of R 3.0.0. There will be plenty of commentary and useful information at sites such as R-bloggers (for example, Tal’s post). Version 3.0.0 is great news for bioinformaticians, due to the introduction of long vectors. What does that mean? Well, several months ago, I was using the ... [Read more...]

R/ggplot2 tip: aes_string

February 25, 2013 | nsaunders

I’m a big fan of ggplot2. Recently, I ran into a situation which called for a useful feature that I had not used previously: aes_string. Imagine that you have data consisting of observations for several variables – let’s say A, B, C – where each observation is from one ... [Read more...]

« 1 2 3 4 5 6 7 »

Articles by nsaunders

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)