“The R-Files” is an occasional series from Revolution Analytics, where we profile prominent members of the R Community.
Name: Martin Morgan
Profession: Senior Staff Scientist at Fred Hutchinson Cancer Research Center
Years Using R: 7
Known for: Director of the Bioconductor project
Martin Morgan is a Senior Staff Scientist at the Fred Hutchinson Cancer Research Center (FHCRC) in Seattle. He is perhaps best known for running the Bioconductor project, which has emerged as the tool of choice for scientists conducting analyses of high-throughput genomic data.
His use of R stems from his early days working at FHCRC, where one of his colleagues was R project co-founder Robert Gentleman. Gentleman also founded the Bioconductor project. While he had worked with a fair share of statistical analyses as he pursued his Ph.D. in Evolutionary Genetics at the University of Chicago, he was especially impressed by R’s flexible, powerful nature.
Morgan describes his first real interaction with the wider R community, which came via the R and Bioconductor mailing lists. As he puts it, “I was impressed with both the level of sophistication and engagement of the users on the mailing list. I remember realizing that the people responding to questions were the very authors of the packages in question.”
As he became more familiar with R, Morgan took on an increasingly active role in Bioconductor, eventually taking the lead on the project. Morgan and his team of researchers and analysts at FHCRC have worked to develop a number of R packages for genomic research, including the popular Rsamtools, IRanges, GenomicRanges and GenomicFeatures packages for importing and using next-generation sequence data. Morgan was quick to credit Michael Lawrence, Herve Pages, Patrick Aboyoun, and Marc Carlson in particular for their efforts in developing these packages.
Morgan cites significant insights from the first R project he embarked on. “I’d initially written a C program that was literally thousands of lines of code. After a while, I decided to try programming in R; since the facilities for data input and optimization were already there, I was able to do it in just six lines of R code.”
When asked about particular areas in which he would like to see R evolve, he responded, “Because R is a programming language, there’s a risk that people lose sight of its biggest strengths — namely, its nearly limitless capacity for statistical analysis.” He paused, before continuing, “Some conservative elements of the R project help reproducible research, but encouraging more interoperability between packages would improve R’s core functionality.”