The R packages in a data scientist’s toolbox

July 17, 2012

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

John Myles White, self-described "statistics hacker" and co-author of "Machine Learning for Hackers" was interviewed recently by The Setup. In the interview, he describes his some of his go-to R packages for data science:

Most of my work involves programming, so programming languages and their libraries are the bulk of the software I use. I primarily program in R, but, if the situation calls for it, I'll use MatlabRuby or Python. ...

That said, for me the specific language I use is much less important than the libraries availble for that language. In R, I do most of my graphics using ggplot2, and I clean my data using plyr, reshape, lubridate and stringr. I do most of my analysis using rjags, which interfaces with JAGS, and I'll sometimes use glmnet for regression modeling. And, of course, I use ProjectTemplate to organize all of my statistical modeling work. To do text analysis, I'll use the tm and lda packages.

Also in JMW's toolbox: Julia, TextMate 2, MySQL, Dropbox and a beefy MacBook. Read the full interview linked below for an insightful look at how he uses these and other tools day to day.

The Setup / Interview: John Myles White

To leave a comment for the author, please follow the link and comment on his blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.