# A brief note: R 3.0.0 and bioinformatics

April 3, 2013
By

(This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers)

Today marks the release of R 3.0.0. There will be plenty of commentary and useful information at sites such as R-bloggers (for example, Tal’s post).

Version 3.0.0 is great news for bioinformaticians, due to the introduction of long vectors. What does that mean? Well, several months ago, I was using the simpleaffy package from Bioconductor to normalize Affymetrix exon microarrays. I began as usual by reading the CEL files:

f <- list.files(path = "data/affyexon", pattern = ".CEL.gz", full.names = T, recursive = T)


When this happened:

Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData,  :
allocMatrix: too many elements specified


I had a relatively-large number of samples (337), but figured a 64-bit machine with ~ 100 GB RAM should be able to cope. I was wrong: due to a hard-coded limit to vector length in R, my matrix had become too large regardless of available memory. See this post and this StackOverflow question for the computational details.

My solution at the time was to resort to Affymetrix Power Tools. Hopefully, the introduction of the LONG vector will make Bioconductor even more capable and useful.

Filed under: bioinformatics, programming, R, statistics Tagged: 3.0.0, affymetrix, bioconductor, microarray