Articles by nsaunders

Analysis of retractions in PubMed

November 30, 2010 | nsaunders

As so often happens these days, a brief post at FriendFeed got me thinking about data analysis. Entitled “So how many retractions are there every year, anyway?”, the post links to this article at Retraction Watch. It discusses ways to estimate the number of retractions and in particular, a recent ... [Read more...]

Findings increasingly novel, scientists say…

October 29, 2010 | nsaunders

…was the tongue-in-cheek title of an image that I posted to Twitpic this week. It shows the usage of the word “novel” in PubMed article titles over time. As someone correctly pointed out at FriendFeed, it needs to be corrected for total publications per year. It was inspired by a ...
[Read more...]

BioStar users (of the world, unite)

October 9, 2010 | nsaunders

Egon writes: Can someone please plot the BioStar users on a Google Map? Sounds like a challenge. Let’s go. 1. Harvesting user IP addresses BioStar user profiles (here’s mine) include a location field. It’s free text and optional, which means that location is missing or inaccurate for many ...
[Read more...]

GEO database: curation lagging behind submission?

August 30, 2010 | nsaunders

I was reading an old post that describes GEOmetadb, a downloadable database containing metadata from the GEO database. We had a brief discussion in the comments about the growth in GSE records (user-submitted) versus GDS records (curated datasets) over time. Below, some quick and dirty R code to examine the ... [Read more...]

Abstract word clouds using R

August 23, 2010 | nsaunders

A recent question over at BioStar asked whether abstracts returned from a PubMed search could easily be visualised as “word clouds”, using Wordle. This got me thinking about ways to solve the problem using R. Here’s my first attempt, which demonstrates some functions from the RCurl and XML packages. ... [Read more...]

A brief introduction to “apply” in R

August 19, 2010 | nsaunders

At any R Q&A site, you’ll frequently see an exchange like this one: Q: How can I use a loop to [...insert task here...] ? A: Don’t. Use one of the apply functions. So, what are these wondrous apply functions and how do they work? I think the ... [Read more...]

Analysing the ISMB 2010 meeting using R

July 20, 2010 | nsaunders

The colossus of bioinformatics meetings, ISMB, convened in Boston this year from July 9 – 13. As in recent years, the meeting was covered online at its website, FriendFeed and Twitter. I thought it would be fun to run a quick analysis of activity at the FriendFeed room using R. 1. Fetch the data ...
[Read more...]

biomaRt and GenomeGraphs: a worked example

June 6, 2010 | nsaunders

As promised a few posts ago, another demonstration of the excellent biomaRt package, this time in conjunction with GenomeGraphs. Here’s what we’re going to do: Grab some public microarray data Normalise and get a list of the most differentially-expressed probesets Use biomaRt to fetch the genes associated with ... [Read more...]

Beware of rogue header files (Bioconductor installation)

May 11, 2010 | nsaunders

Just a short note concerning a “gotcha”. As I have many times before, I opened an R console on my newly-upgraded (to lucid 10.04) Ubuntu machine, typed source(“http://bioconductor.org/biocLite.R”) and began a Bioconductor install with biocLite(). Only this time, I saw this: Error in dyn.load(file, ... [Read more...]

Experiments with igraph

April 21, 2010 | nsaunders

Networks – social and biological – are all the rage, just now. Indeed, a recent entry at Duncan’s QOTD described the “hairball” network representation as the dominant cultural icon in molecular biology. I’ve not had occasion to explore networks “professionally”, but have always been fascinated by both networks and the ...
[Read more...]

Plotting “time of day” data using ggplot2

April 14, 2010 | nsaunders

William asks: How can I make a graph that looks like this, “tweet density” style, showing time intervals? He then helpfully describes his input data: a CSV file with headers “time started, time finished, date”. Here’s a simple CSV file, tasks.csv: task,date,start,end task1,2010-03-05,09:00:00,13:00:00 ... [Read more...]

BioMart (and biomaRt)

March 26, 2010 | nsaunders

I’ve been vaguely aware of BioMart for a few years. Inexplicably, I’ve only recently started to use it. It’s one of the most useful applications I’ve ever used. The concept is simple. You have a set of identifiers that describe a biological object, such as a ... [Read more...]

From the “blogosphere”? Hardly.

January 27, 2010 | nsaunders

I generally skip over “From the Blogosphere”, a (mostly) weekly-summary of one or two blog posts in Nature’s “Authors” section (here is the latest). Why? Well, I’ve always suspected that the title is rather misleading. Now, I have the hard numbers to prove it. My feed reader contains ...
[Read more...]

A new twist on the identifier mapping problem

January 11, 2010 | nsaunders

Yesterday, Deepak wrote about BridgeDB, a software package to deal with the “identifier mapping problem”. Put simply, biologists can name a biological entity in any way that they like, leading to multiple names for the same object. Easily solved, you might think, by choosing one identifier and sticking to it, ... [Read more...]

The Life Scientists at FriendFeed: 2009 summary

December 23, 2009 | nsaunders

It’s Christmas Eve tomorrow and so I declare the year over. My Christmas gift to you is a summary of activity in 2009 at the FriendFeed Life Scientists group. It’s crafted using R + Ruby, with raw data and some code snippets available. If you want to see the most ...
[Read more...]
1 4 5 6 7

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)