Become an effective data hacker with the R-Hadoop stack

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In discussion with several data scientists, Will Stanton (a data scientist with Return Path) learned that a common concern is: what software should I be using? There are many options out there, but what is the best platform to be an effective “data hacker”?

Will recommends using a technology stack with R and Hadoop, which allows data scientists “to do almost anything you need to for data hacking”. With this platform, you have all the tools you need for:

  • Statistical Programming
  • Machine Learning
  • Visualization
  • Reporting / Dashboarding
  • Databases
  • Big Data
  • Data Munging

On the other hand, Will says the stack works best on Unix or Linux based systems (Windows is possible, but tricky), and isn't ideally suited for text mining or web-based applicatons. But if this is something you want to try, a good start is the RHadoop project, a collection of R packages that connect R and Hadoop.

For more on being a data hacker with R-Hadoop stack, check out Will's complete blog post linked below.

Will Stanton's Data Science blog: Becoming a data “hacker” (via Joaquim Coll)

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)