Teaser: Running R as a map/reduce job from Riak

August 17, 2011
By

(This article was first published on Cartesian Faith » R, and kindly contributed to R-bloggers)

Alliterations aside, here is a preview of something I’ve been tinkering with. My goal is to be able to run R code as a phase within a Riak map/reduce job. In a multi-cultural world filled with distinct languages, it should be obvious that one size does not fit all. In the case of erlang, statistics is not its strong suit. Writing a sparse matrix class is bad enough, but imagine implementing regression or random matrix theory. For its part and despite many honorable attempts, R isn’t great at distributed processing. So waving the banner of bringing the processing to the data, why not use R to process portions of a map/reduce job?

This actually isn’t as hard as it sounds. Below are a few snippets of running R code via an erlang RPC. This means that R is available and running as an erlang node!

First, we are calling the R function ‘mean’to calculate the arithmetic mean of the list of numbers

<pre>(test@localhost)57> rpc:call('rchimedes@localhost', rchimedes, eval, {mean, [[10,12,13,25,20]]}).
{ok,{16.0}}</pre>

Next we’ll get samples from a random normal distribution. To me, calling rnorm is analogous to Hello, World for R.

(test@localhost)58> rpc:call('rchimedes@localhost', rchimedes, eval, {rnorm, [10]}).
{ok,{-1.3440940467953522,1.0346333094171907,
-2.7704297093573698,0.32721935800723084,1.6406162089066918,
-0.480623709693892,-1.4687159958435285,-0.4415948361775166,
-1.2729869815762578,0.8369905573667532}}

Currently the syntax is structured to use atoms as function references (i.e. the function must exist in R space) and binary strings as function defintions. Notice that the arguments passed to the function are sent in a list. This is standard erlang to support additional arguments for the remote function call. For example, lets say we want to pull from a normal distribution with mean 5:


(test@localhost)60> rpc:call('rchimedes@localhost', rchimedes, eval, {rnorm, [10,5]}).
{ok,{4.939374253203547,5.2481766179207545,6.413720221228998,
5.679098487985773,6.371656468561924,5.572533109697437,
4.196247547549403,5.36443397342678,3.7423040151803044,
6.979719956460093}}

The above examples hopefully whet your appetite for what is possible here. The next step in the exercise is to execute from a Riak job and pull it all together in a complete job. Any ideas on case studies are welcome. Otherwise, brace yourself for something finance related.


To leave a comment for the author, please follow the link and comment on his blog: Cartesian Faith » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.