How to build a single-node Hadoop/R system

September 3, 2013

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The best way to learn any software is to use it, and if you're new to Hadoop and want to try using Hadoop with R the process of setting up your own Hadoop cluster can be daunting (to say the least). But if learning is the goal, the key is that you don't need to install a full cluster. All you need is your own machine, and the ability to install software from the shell command line. recently published the tutorial "Building an R Hadoop System" with step-by-step procedures for installing Hadoop, R, and RHadoop (including the rmr2 package) on a standard Mac system. (The same procedures will likely work on any Linux-based system as well, with minor tweaks.) Since the Hadoop system is configured in standalong mode on the single machine, you don't have to worry about any of the details around intra-node communication and distributing software across the nodes of a multi-node cluster. The whole process takes about 30 minutes to set up, after which you can start on the Mapreduce in R tutorial from the Revolution Analytics github repository.

Get started with the six-step installation tutorial at the link below. Building an R Hadoop System

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)