Blog Archives

Just how many retracted articles are there in PubMed anyway?

March 19, 2015
By
Just how many retracted articles are there in PubMed anyway?

I am forever returning to PubMed data, downloaded as XML, trying to extract information from it and becoming deeply confused in the process. Take the seemingly-simple question “how many retracted articles are there in PubMed?” Well, one way is to search for records with the publication type “Retracted Article”. As of right now, that returns

Read more »

Make prettier documents by reusing chunks in RMarkdown

February 23, 2015
By
Make prettier documents by reusing chunks in RMarkdown

No revelations here, just a little R tip for generating more readable documents. There are times when I want to show code in a document, but I don’t want it to be the first thing that people see. What I want to see first is the output from that code. In this silly example, I

Read more »

Counting things is hard for a given value of “things”

December 1, 2014
By
Counting things is hard for a given value of “things”

This post is just a summary of some interesting online discussion from last week around open access publishing. I learned a few things about definitions and PubMed/PMC filters. It all begins with an opinion piece, “Open access is tiring out peer reviewers.” With a title like that you might expect rebuttals from people like Michael

Read more »

Bioinformatics journals: time from submission to acceptance, revisited

October 13, 2014
By
Bioinformatics journals: time from submission to acceptance, revisited

Before we start: yes, we’ve been here before. There was the Biostars question “Calculating Time From Submission To Publication / Degree Of Burden In Submitting A Paper.” That gave rise to Pierre’s excellent blog post and code + data on Figshare. So why are we here again? 1. It’s been a couple of years. 2.

Read more »

PubMed Publication Date: what is it, exactly?

September 23, 2014
By
PubMed Publication Date: what is it, exactly?

File this one under “has troubled me (and others) for some years now, let’s try to resolve it.” Let’s use the excellent R/rentrez package to search PubMed for articles that were retracted in 2013. 117 articles. Now let’s fetch the records in XML format. Next question: which XML element specifies the “Date of publication” (PDAT)?

Read more »

Ebola, Wikipedia and data janitors

September 21, 2014
By
Ebola, Wikipedia and data janitors

Sometimes, several strands of thought come together in one place. For me right now, it’s the Wikipedia page “Ebola virus epidemic in West Africa”, which got me thinking about the perennial topic of “data wrangling”, how best to provide public data and why I can’t shake my irritation with the term “data science”. Not to

Read more »

Venn figures go wrong

August 12, 2014
By
Venn figures go wrong

I thought nothing could top the classic “6-way Venn banana”, featured in The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. That is until I saw Figure 3 from Compact genome of the Antarctic midge is likely an adaptation to an extreme environment. What’s odd is that Figure 2 in the latter paper

Read more »

When life gives you coloured cells, make categories

August 5, 2014
By
When life gives you coloured cells, make categories

Let’s start by making one thing clear. Using coloured cells in Excel to encode different categories of data is wrong. Next time colleagues explain excitedly how “green equals normal and red = tumour”, you must explain that (1) they have sinned and (2) what they meant to do was add a column containing the words

Read more »

Converting a spreadsheet of SMILES: my first OSM contribution

June 30, 2014
By
Converting a spreadsheet of SMILES: my first OSM contribution

I’ve long admired the work of the Open Source Malaria Project. Unfortunately time and “day job” constraints prevent me from being as involved as I’d like. So: I was happy to make a small contribution recently in response to this request for help: Can anyone help @O_S_M to convert this spreadsheet ( malaria.ourexperiment.org/biological_dat…) into chemical

Read more »

This is why code written by scientists gets ugly

May 13, 2014
By
This is why code written by scientists gets ugly

There’s a lot of discussion around why code written by self-taught “scientist programmers” rarely follows what a trained computer scientist would consider “best practice”. Here’s a recent post on the topic. One answer: we begin with exploratory data analysis and never get around to cleaning it up. An example. For some reason, a researcher (let’s

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)