In discussion with several data scientists, Will Stanton (a data scientist with Return Path) learned that a common concern is: what software should I be using? There are many options out there, but what is the best platform to be an effective "data hacker"?
Will recommends using a technology stack with R and Hadoop, which allows data scientists "to do almost anything you need to for data hacking". With this platform, you have all the tools you need for:
- Statistical Programming
- Machine Learning
- Reporting / Dashboarding
- Big Data
- Data Munging
On the other hand, Will says the stack works best on Unix or Linux based systems (Windows is possible, but tricky), and isn't ideally suited for text mining or web-based applicatons. But if this is something you want to try, a good start is the RHadoop project, a collection of R packages that connect R and Hadoop.
For more on being a data hacker with R-Hadoop stack, check out Will's complete blog post linked below.
Will Stanton's Data Science blog: Becoming a data “hacker” (via Joaquim Coll)
To leave a comment
for the author, please follow the link and comment on their blog: Revolutions
offers daily e-mail updates
news and tutorials
on topics such as: Data science
, Big Data, R jobs
, visualization (ggplot2
), programming (RStudio
, Web Scraping
) statistics (regression
, time series
) and more...
If you got this far, why not subscribe for updates
from the site? Choose your flavor: e-mail
, or facebook