How to build a single-node Hadoop/R system

September 3, 2013
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The best way to learn any software is to use it, and if you're new to Hadoop and want to try using Hadoop with R the process of setting up your own Hadoop cluster can be daunting (to say the least). But if learning is the goal, the key is that you don't need to install a full cluster. All you need is your own machine, and the ability to install software from the shell command line.

RDataMining.com recently published the tutorial "Building an R Hadoop System" with step-by-step procedures for installing Hadoop, R, and RHadoop (including the rmr2 package) on a standard Mac system. (The same procedures will likely work on any Linux-based system as well, with minor tweaks.) Since the Hadoop system is configured in standalong mode on the single machine, you don't have to worry about any of the details around intra-node communication and distributing software across the nodes of a multi-node cluster. The whole process takes about 30 minutes to set up, after which you can start on the Mapreduce in R tutorial from the Revolution Analytics github repository.

Get started with the six-step installation tutorial at the link below.

RDataMining.com: Building an R Hadoop System

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.