"Anyone planning to work with Big Data ought to learn Hadoop and R"

October 25, 2011
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Dan Woods at Forbes interviewed LinkedIn's Daniel Tunkelang about the rise of data science and on building data science teams. When asked how students today should prepare themselves to be data scientists, Tunkelang gives some good advice:

When we built the data science team at LinkedIn a few years ago, we looked for raw talent, assuming that smart people could pick up the needed technical skills on the job. Now that the field has matured, it’s a good idea to learn some of those technical skills in school. Anyone planning to work with big data ought to learn Hadoop and R, the two open-source tools most used by data scientists. It’s also a good idea to take courses in statistics in machine learning. Beyond that, find every opportunity to work with real data sets. Struggling with the warts of real data is a key part of a data scientist’s job — in fact, some would say that the struggle is our “day job.”

(Emphasis mine.) Any student thinking about working with Hadoop and R should check out the RHadoop project, a collection of R packages that make it easy to write map-reduce jobs for Hadoop data stores in the R langauge.

Forbes: LinkedIn's Daniel Tunkelang On "What Is a Data Scientist?"

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.