fluent-r: a new R analytics integration library for JVM developers
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by David Russell, fluent-r developer
fluent-r is a new R analytics integration library for JVM application developers that improves upon existing solutions for integrating R analytics services delivered by popular open source R integration servers DeployR and OpenCPU. The fluent-r library provides a natural-language DSL alongside a simple API that can be used to replace or complement existing use of the DeployR RBroker Framework and client libraries during the integration phase (Step. 4) in the following workflow:
Simple Integration
Within any application a developer may have numerous integration points in need of R analytics services. Often each integration point requires custom code to communicate with specific APIs on the R integration server. Typically the greater the number of integration points, the greater the complexity of the code base for an integration solution.
When working with the fluent-r library your integration solution requires the consistent use of just a single API call that looks as follows:
FluentResult result = fluentTask(RServer).stream(fluentDSL).send(inputs).execute();
The API itself is a fluent interface so you can take advantage of concise, method chaining in your code as demonstrated above. When we break down the call itself we can see a chain of 4 distinct method calls:
1. fluentTask(RServer)
The fluentTask(RServer) method call simply identifies the R integration server where your task will be executed. The basic method accepts an R integration server endpoint as a string, such as http://localhost:7400/deployr or http://localhost:8004/ocpu for DeployR and OpenCPU respectively.
As a bonus for DeployR users, you can also pass an existing instance of RBroker or RProject on the fluentTask() method which permits the task to be executed on an RBroker or RProject instance already available within your application. Your fluent-r tasks will then execute on DeployR with the permissions of the RBroker or RProject in question.
This demonstrates how the fluent-r library is 100% compatible with the existing DeployR client libraries, in fact, the DeployR support in the fluent-r library is built on top of the DeployR 7.4 client libraries.
2. stream(fluentDSL)
The stream(fluentDSL) method call identifies the R analytics dependencies for your task, such as the script of function you want to execute. The fluentDSL represents a natural-language, plain-text description of these dependencies, a description that lives outside of your applicaton source code. We will discuss the benefits of this approach in more detail in the Simple Maintenance section that follows.
3. send(inputs)
The send(inputs) method is optional. When used, it allows your application to pass input data to any task. These input data can be RData or JSON or parameter string data for DeployR and OpenCPU respectively.
4. execute()
The execute() method causes your task to be executed. The task will execute on the R integration server identified by the fluentTask() method, the task will execute the script or function identified by the stream(fluentDSL) method, optionally passing input data on the send(inputs) method.
The return value on the execute method is an instance of FluentResult where result data is made available on a common result interface.
When broken down method-by-method the API call chain is shown to be concise and simple. Here's an actual code snippet that shows how an external DSL can be referenced and then executed as a task on DeployR using the fluent-r API call:
// DeployR Fluent R DSL – Fraud Score Example
def fluentDSL = new URL(“https://git.io/vcz0R”)
def result = fluentTask(“http://localhost:7400/deployr”, “testuser”,
“changeme”).stream(fluentDSL).send(taskInputs).execute()
Keep in mind, as shown above only a single line of integration code was needed to define, parameterize and execute your task. No boiler plate code was needed, and no time was spent wondering about what API call to use, there's only one. It is for this reason that the fluent-r library claims to deliver a simple integration solution.
Simple Maintenance
When working with existing client libraries on the JVM, application developers often end up hard-coding details of their R analytics dependencies, such as the script names, directories, etc. and any model or data dependencies directly within their application code. This approach tightly couples these R analytics dependencies with the underlying application source code. That in turn tightly couples the application code with the physical location of these dependencies that live on the R integration server, such as repository-managed files or R packages on DeployR and OpenCPU respectively. This can lead to unwanted complexity and costs that later can manifest as broken application code.
The fluent-r library provides a novel solution to this problem by offering a DSL, called the “Fluent DSL”. This DSL allows the declaration of R analytics dependencies using natural language in plain text, that can be managed entirely outside of your application code. Given the natural language nature of this solution it is best explained and understood by example. Here is a sample DeployR Fluent DSL that describes the R analytics dependencies for the DeployR Fraud Score sample application:
load 'fraudModel.rData' into workspace from 'example-fraud-score' by 'testuser'
execute 'ccFraudScore.R' from 'example-fraud-score' by 'testuser'
fetch 'x' from workspace
This DSL is simple to write, understand and modify if needed. In this case the DSL defined for the task requires a fraudModel.rData to be loaded in advance of task execution. The ccFraudScore.R repository-managed script will be executed and the score generated into a workspace object called x that will be returned on the result.
Another example this time using the OpenCPU Fluent DSL describes a simple R analytics dependency that executes the tv function found in the tvscore R package that ships with OpenCPU:
execute 'tv' from 'tvscore'
As R package functions have implicit return values there is no need to explicitly declare a fetch for this particular DSL. OpenCPU also supports the execution of R functions or scripts that live on CRAN, github and Bioconductor so the DSL also supports these kinds of tasks. For example, here is a DSL that requests the execution of the geodistance function found in the dpu.mobility R package on github:
execute 'geodistance' from 'dpu.mobility' on github by 'openmhealth'
The simplicity and flexibility afforded by the Fluent DSL has many advantages, both during the development phase and later during the maintenance phase. Without modifying application code new scripts or functions can be tested by live application code by simply modifying one or more of the DSL dependencies. Over time if scripts or functions are moved, renamed or otherwise modified, patching a live application without application restart is no more complicated that updating the DSL declarations to match the new script or function dependencies.
Simple Integration API Design
Perhaps the full potential of the library is best understood when you recognize that each individual DSL you define is in effect defining a single API service for your application and any collection of DSLs you define is therefore defining a custom R integration API for your application.
Designing, managing and maintaining your R integration API as a set of “Fluent DSLs” allows rapid development and easy maintenance. You can read a brief writeup that discusses Designing Zero-Code R Integration APIs using the fluent-r library.
Using the fluent-r Library
The fluent-r libary has been released under the Apache 2 license so is freely available for both open-source and commerical use. Release versions have been published to the Maven Central Repository and the library can therefore be easily used within Ant, Maven, Gradle and Grape projects or the artifacts can be downloaded and added to your application's classpath, whichever you prefer.
compile 'io.onetapbeyond:fluent-r:1.1'
The complete fluent-r library Javadoc is available. And sample DSLs and integration code for DeployR and OpenCPU are available as gist repositories on GitHub.
If you are a Java, Groovy, Scala or Clojure programmer on the JVM that needs R integration services from DeployR or OpenCPU within your application, the fluent-r library may be for you. If you check it out please do let us know what you think.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.