Scaling RServe Deployments

[This article was first published on R – Bora Beran, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tableau runs R scripts using RServe, a free, open-source R package. But if you have a large number of users on Tableau Server and use R scripts heavily, pointing Tableau to a single RServe instance may not be sufficient.

Luckily you can use a load-balancer to distribute the load across multiple RServe instances without having to invest in a commercial R distribution. In this blog post, I will show you, how you can achieve this using another open source project called HAProxy.

Let’s start by installing HAProxy.

On Mac you can do this by running the following commands in the terminal

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Followed by

brew install haproxy

Create the config file that contains pointers to the Rserve instances.

In this case I created in the folder ‘/usr/local/Cellar/haproxy/’ but it could have been some other folder.

global
    daemon
    maxconn 256

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

listen stats
    bind :8080
    mode http
    stats enable
    stats realm Haproxy\ Statistics
    stats uri /haproxy_stats
    stats hide-version
    stats auth admin:admin@rserve

frontend rserve_frontend
    bind *:80
    mode tcp
    timeout client  1m
    default_backend rserve_backend

backend rserve_backend
    mode tcp
    option log-health-checks
    option redispatch
    balance roundrobin
    timeout connect 10s
    timeout server 1m
    server rserve1 localhost:6311 check maxconn 32
    server rserve2 anotherserver.abc.lan:6311 check maxconn 32

The highlights in the config file are the timeouts, max connections allowed for each Rserve instance, host:port for Rserve instances, load balancer listening on port 80, balancing being done using roundrobin method, server stats page configured on port 8080 and username and password for accessing the stats page. I used a very basic configuration but HAProxy documentation has detailed info on all the options.

Let’s check if config file is valid and we don’t have any typos etc.

BBERAN-MAC:~ bberan$ haproxy -f /usr/local/Cellar/haproxy/haproxy.cfg -c
Configuration file is valid

Now you can start HAproxy by passing a pointer to the config file as shown below:

sudo haproxy -f /usr/local/Cellar/haproxy/haproxy.cfg

Let’s launch Tableau and enter the host and port number for the load balancer instead of an actual RServe instance.

Connection information for the load balancer

Success!! I can see the results from R’s forecasting package in Tableau through the load balancer we just configured.

Results of R script evaluated through the load balancer

Let’s run the calculation one more time.

Now let’s look at the stats page for our HAProxy instance. In this case per our configuration file by navigating to http://localhost:8080/haproxy_stats.

Server statistics for the load balancer for Rserve instances

I can see the two requests I made and that they ended up being evaluated on different RServe instances as expected since round-robin load balancing forwards a client request to each server in turn.

Now let’s install it on a server that is more likely to be used in production and have it start up automatically etc.

I used a Linux machine (Ubuntu 14.04 specifically) for this. There are only a few small differences in the configuration steps. To install HAProxy, in a terminal window enter :

apt-get install haproxy

Now edit the haproxy file under the directory /etc/default/ and set ENABLED=1. This is by default 0. Setting to 1 will run HAProxy automatically when the machine starts.

Now let’s edit the config file which can be found here /etc/haproxy/haproxy.cfg to match the config example above.

And we’re ready to start the load balancer:

sudo service haproxy start

Now you can serve many more visualizations containing R scripts to a larger number of Tableau users. Depending on the amount of load you’re dealing with, you can start with running multiple RServe processes on different ports of the same machine or you can add more machines to scale out further.

Time to put advanced analytics dashboards on more screens?


To leave a comment for the author, please follow the link and comment on their blog: R – Bora Beran.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)