Become an effective data hacker with the R-Hadoop stack

September 24, 2014
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

In discussion with several data scientists, Will Stanton (a data scientist with Return Path) learned that a common concern is: what software should I be using? There are many options out there, but what is the best platform to be an effective "data hacker"?

Will recommends using a technology stack with R and Hadoop, which allows data scientists "to do almost anything you need to for data hacking". With this platform, you have all the tools you need for:

  • Statistical Programming
  • Machine Learning
  • Visualization
  • Reporting / Dashboarding
  • Databases
  • Big Data
  • Data Munging

On the other hand, Will says the stack works best on Unix or Linux based systems (Windows is possible, but tricky), and isn't ideally suited for text mining or web-based applicatons. But if this is something you want to try, a good start is the RHadoop project, a collection of R packages that connect R and Hadoop.

For more on being a data hacker with R-Hadoop stack, check out Will's complete blog post linked below.

Will Stanton's Data Science blog: Becoming a data “hacker” (via Joaquim Coll)

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)