Production R at ONS

February 13, 2017
By

(This article was first published on Mango Solutions » R Blog, and kindly contributed to R-bloggers)

Mark Sellors
Head of Data Engineering

ons-logo

I’ve recently been working at the Office for National Statistics, under the very broad umbrella of a SAS to R transition project.

As you might imagine, ONS, as the UK’s largest producer of official statistics, has a huge number of internal statistical applications, so it’s not possible to “simply” switch from SAS to R. Working with various product owners and maintainers, a small team from Mango were able to perform a deep dive into a few of their existing applications which rely on SAS for statistical processing, and identify some likely candidates for proof of concept transitions.

The Mango team consisted of three individuals with skills covering Java and R development, business analysis, project management and architecture/infrastructure deployment. This combination was perfect for the task at hand, and we’ve delivered a solid foundation upon which ONS can build.

Mango had already prepared an initial report into the existing SAS usage within ONS as part of a prior exploratory project, so we were able to hit the ground running and make some very real gains over the course of the project. One of the applications we looked at, was an in-house developed, time-series analysis tool kit. This application is starting to show its age a little now, but is well used and understood, both within ONS and amongst their wide user-base. This application is broadly based around a Service Oriented Architecture, which in theory makes transitioning some of those services to R a snap, so we started here.

An internal PoC had already been conducted, where a small Java app had been wired in with an Rserve based back-end. This convinced them that what they were thinking about would at least be possible, so ONS decided to take it further and bring Mango in to help.

The first thing to do was to get a demo environment built, so we requested a server, and installed R, lots of useful packages and RStudio Server, and Shiny Server.

Next, we needed to decide on a way to present our R functions as a service that could be consumed over the network, in much the same way as the SAS ones could. For the sake of speed, simplicity, and flexibility, I suggested that we use Jeff Allen’s excellent plumber package. I’ve talked about plumber quite a bit before, but for those of you who are unaware, plumber takes your R functions and makes them immediately available as a web API, which makes it a perfect fit for ONS. Jeff has also helpfully provided information about hosting plumber services on the project’s website.

At Mango, we’ve done variations of this project many times, but this specific occasion offered us the chance to really explore the limits of what we could achieve with plumber, and I’m really happy to report that it exceeded expectations substantially.

There are implementation details that I won’t bore you with, but at a high level, we were able to take Jeff’s existing instructions and expand on them to suit our needs with ease. Along the way we’ve encountered and solved a few problems, but in general we were able to implement a complete platform based around a microservice architecture, with individual services written in R using the plumber package.

In the time-series application for example, we were able to demonstrate the replacement of two existing SAS services, with ones written in R. We made some modifications to the Java app itself, re-implemented two of the existing SAS services in R, and then served that using multiple plumber instances and a load balancer, to demonstrate how it would work in a production setting.

I’ll be sharing more details about exactly what we did, in future posts, so look out for those, but for now, I’d encourage you to investigate plumber and see if it has a place in your business!

To leave a comment for the author, please follow the link and comment on their blog: Mango Solutions » R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)