In discussion with several data scientists, Will Stanton (a data scientist with Return Path) learned that a common concern is: what software should I be using? There are many options out there, but what is the best platform to be an effective “data hacker”?
- Statistical Programming
- Machine Learning
- Reporting / Dashboarding
- Big Data
- Data Munging
On the other hand, Will says the stack works best on Unix or Linux based systems (Windows is possible, but tricky), and isn't ideally suited for text mining or web-based applicatons. But if this is something you want to try, a good start is the RHadoop project, a collection of R packages that connect R and Hadoop.
For more on being a data hacker with R-Hadoop stack, check out Will's complete blog post linked below.