Dan Woods at Forbes interviewed LinkedIn's Daniel Tunkelang about the rise of data science and on building data science teams. When asked how students today should prepare themselves to be data scientists, Tunkelang gives some good advice:
When we built the data science team at LinkedIn a few years ago, we looked for raw talent, assuming that smart people could pick up the needed technical skills on the job. Now that the field has matured, it’s a good idea to learn some of those technical skills in school. Anyone planning to work with big data ought to learn Hadoop and R, the two open-source tools most used by data scientists. It’s also a good idea to take courses in statistics in machine learning. Beyond that, find every opportunity to work with real data sets. Struggling with the warts of real data is a key part of a data scientist’s job — in fact, some would say that the struggle is our “day job.”
(Emphasis mine.) Any student thinking about working with Hadoop and R should check out the RHadoop project, a collection of R packages that make it easy to write map-reduce jobs for Hadoop data stores in the R langauge.