“The R-Files” is an occasional series from Revolution Analytics, where we profile prominent members of the R Community.
Name: Hadley Wickham
Profession: Assistant Professor of Statistics, Rice University
Nationality: New Zealand
Years Using R: 10
An Assistant Professor of Statistics at Rice University, Hadley Wickham has been using R for over 10 years. In that time, he has emerged as one of the most prominent members of the R community. Wickham’s research focuses primarily on data analysis and developing innovative tools that facilitate the understanding of complex statistical models through visualization. He is known for developing some of today’s most popular R packages, including ggplot2, plyr and reshape.
Wickham’s R packages are used by a wide range of high-profile institutions for complex statistical analyses, including the U.S. Air Force Research Laboratories, Mozilla Labs and the Vanderbilt University Center for Human Genetics Research. ggplot2 was also used earlier this year by NYU doctoral student Drew Conway to visualize the Wikileaks data, offering far greater insight into troop movements and conflict hot zones in Afghanistan than was previously available to the public.
He first encountered R in an undergraduate statistics course at the University of Auckland—the birthplace of R. Admittedly, he found the language a bit overwhelming at first. Of his early experiences with the language, he says, “I remember when I first used R, I found it to be the most enormously frustrating—yet magical—computer language I had ever encountered.”
For Wickham, the frustration came from R’s organic growth, which—he’s quick to add—is also its greatest strength. Thanks to its open source roots and community of users and contributors, R has grown without a roadmap or goal. While you can do just about anything with R, Wickham notes that it can be an intimidating language for new users to learn, and does not have the most user-friendly interface.
As a doctoral student at Iowa State University, Wickham became increasingly familiar with—and fond of—R. Working under the tutelage of Di Cook and Heike Hoffman, he began to master the language’s wide suite of functional capabilities and eventually started to develop R packages to contribute back to the community. To date, Wickham has developed 20 packages on top of R, with an emphasis on visualization and manipulation of data.
In his own classes, Wickham uses R to teach a new generation of statisticians. Among some of his favorite subjects for teach with R are tracking the recent American sub-prime mortgage crisis and using data released by the Social Security Administration to study the evolution of baby names. “R is the most powerful statistical computing language on the planet,” Wickham says. “Students today have such an advantage over previous generations; R has nearly limitless statistical capability and allows them to perform a far greater range of analyses than was previously possible.”
While critics argue that R is not as “pretty” a programming language as Java and C++, Wickham sees that as one of its greatest strengths. “One of the things I like about R is that it is so pragmatic,” he says. “It doesn’t try to hold to theories that might be lovely and elegant, but don’t work in practice. It’s the result of a brilliant community coming together to make it work, and it’s constantly evolving.”
Wickham cites the community behind R, which is over 2-million users strong, and their willingness to support one another as the primary driver for its great success. “When you ask a question, you’re getting free help from the people who wrote R. Not only are they great programmers, they’re internationally recognized statisticians who have contributed and developed leading theories of modern statistics.”