Production R at ONS
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Head of Data Engineering
I’ve recently been working at the Office for National Statistics, under the very broad umbrella of a SAS to R transition project.
As you might imagine, ONS, as the UK’s largest producer of official statistics, has a huge number of internal statistical applications, so it’s not possible to “simply” switch from SAS to R. Working with various product owners and maintainers, a small team from Mango were able to perform a deep dive into a few of their existing applications which rely on SAS for statistical processing, and identify some likely candidates for proof of concept transitions.
The Mango team consisted of three individuals with skills covering Java and R development, business analysis, project management and architecture/infrastructure deployment. This combination was perfect for the task at hand, and we’ve delivered a solid foundation upon which ONS can build.
Mango had already prepared an initial report into the existing SAS usage within ONS as part of a prior exploratory project, so we were able to hit the ground running and make some very real gains over the course of the project. One of the applications we looked at, was an in-house developed, time-series analysis tool kit. This application is starting to show its age a little now, but is well used and understood, both within ONS and amongst their wide user-base. This application is broadly based around a Service Oriented Architecture, which in theory makes transitioning some of those services to R a snap, so we started here.
An internal PoC had already been conducted, where a small Java app had been wired in with an Rserve based back-end. This convinced them that what they were thinking about would at least be possible, so ONS decided to take it further and bring Mango in to help.
The first thing to do was to get a demo environment built, so we requested a server, and installed R, lots of useful packages and RStudio Server, and Shiny Server.
Next, we needed to decide on a way to present our R functions as a service that could be consumed over the network, in much the same way as the SAS ones could. For the sake of speed, simplicity, and flexibility, I suggested that we use Jeff Allen’s excellent plumber package. I’ve talked about plumber quite a bit before, but for those of you who are unaware, plumber takes your R functions and makes them immediately available as a web API, which makes it a perfect fit for ONS. Jeff has also helpfully provided information about hosting plumber services on the project’s website.
At Mango, we’ve done variations of this project many times, but this specific occasion offered us the chance to really explore the limits of what we could achieve with plumber, and I’m really happy to report that it exceeded expectations substantially.
There are implementation details that I won’t bore you with, but at a high level, we were able to take Jeff’s existing instructions and expand on them to suit our needs with ease. Along the way we’ve encountered and solved a few problems, but in general we were able to implement a complete platform based around a microservice architecture, with individual services written in R using the plumber package.
In the time-series application for example, we were able to demonstrate the replacement of two existing SAS services, with ones written in R. We made some modifications to the Java app itself, re-implemented two of the existing SAS services in R, and then served that using multiple plumber instances and a load balancer, to demonstrate how it would work in a production setting.
I’ll be sharing more details about exactly what we did, in future posts, so look out for those, but for now, I’d encourage you to investigate plumber and see if it has a place in your business!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.