How to build a single-node Hadoop/R system

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The best way to learn any software is to use it, and if you're new to Hadoop and want to try using Hadoop with R the process of setting up your own Hadoop cluster can be daunting (to say the least). But if learning is the goal, the key is that you don't need to install a full cluster. All you need is your own machine, and the ability to install software from the shell command line.

RDataMining.com recently published the tutorial “Building an R Hadoop System” with step-by-step procedures for installing Hadoop, R, and RHadoop (including the rmr2 package) on a standard Mac system. (The same procedures will likely work on any Linux-based system as well, with minor tweaks.) Since the Hadoop system is configured in standalong mode on the single machine, you don't have to worry about any of the details around intra-node communication and distributing software across the nodes of a multi-node cluster. The whole process takes about 30 minutes to set up, after which you can start on the Mapreduce in R tutorial from the Revolution Analytics github repository.

Get started with the six-step installation tutorial at the link below.

RDataMining.com: Building an R Hadoop System

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)