A brief note: R 3.0.0 and bioinformatics

April 3, 2013
By

(This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers)

Today marks the release of R 3.0.0. There will be plenty of commentary and useful information at sites such as R-bloggers (for example, Tal’s post).

Version 3.0.0 is great news for bioinformaticians, due to the introduction of long vectors. What does that mean? Well, several months ago, I was using the simpleaffy package from Bioconductor to normalize Affymetrix exon microarrays. I began as usual by reading the CEL files:

f <- list.files(path = "data/affyexon", pattern = ".CEL.gz", full.names = T, recursive = T)
cel <- ReadAffy(filenames = f)

When this happened:

Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData,  : 
  allocMatrix: too many elements specified

I had a relatively-large number of samples (337), but figured a 64-bit machine with ~ 100 GB RAM should be able to cope. I was wrong: due to a hard-coded limit to vector length in R, my matrix had become too large regardless of available memory. See this post and this StackOverflow question for the computational details.

My solution at the time was to resort to Affymetrix Power Tools. Hopefully, the introduction of the LONG vector will make Bioconductor even more capable and useful.


Filed under: bioinformatics, programming, R, statistics Tagged: 3.0.0, affymetrix, bioconductor, microarray

To leave a comment for the author, please follow the link and comment on his blog: What You're Doing Is Rather Desperate » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.