Predicting R models with PMML: Revolution R Enterprise and ADAPA

March 24, 2011
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The recently announced Revolution Analytics / Zementis partnership goes a long way towards demonstrating how R fits into big-league production environments. A frequent complaint against R is that although R is fine prototyping tool it is not able to handle production environments. Well, that’s just not true. In fact, it is straightforward to build a model in R, translate it into PMML using a standard R library, and then send the PMML file off to Zementis’ ADAPA scoring engine where the model described in the PMML file can be used to score a new data set. Moreover, using Revolution’s RevoDeployR web services technology it is relatively easy to set up the infrastructure where: Revolution R is running on a server somewhere (on site or in the cloud), the ADAPA scoring engine is running on another server and users can access both through a light client, browser or any BI tool.

The following code provides a simple example of splitting a file into training data and testing data, building a simple model and translating it to PMML.

# Load the required R libraries
library(pmml);
library(XML);
 
# Read in audit data and split into a training file and a testing file
auditDF <- read.csv("http://rattle.togaware.com/audit.csv")
auditDF <- na.omit(auditDF)              # remove NAs to make things easy
 
target <- auditDF$TARGET_Adjusted       # Get number of observations
N <- length(target); M <- N - 500  
i.train <- sample(N,M)                  # Get a random sample for training
audit.train <- auditDF[i.train,]
audit.test  <- auditDF[-i.train,]
 
# Build a logistic regression model
glm.model <- glm(audit.train$TARGET_Adjusted ~ .,data=audit.train,family="binomial")
 
# Describe the model in PMML and save it in an AML file
glm.pmml <- pmml(glm.model,name="glm model",data=trainDF)
xmlFile <- file.path(getwd(),"audit-glm.xml")
saveXML(glm.pmml,xmlFile)

Created by Pretty R at inside-R.org

The first few lines of PMML code that gets built should look something like:

<PMML version="3.2" xmlns="http://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-3_2 http://www.dmg.org/v3-2/pmml-3-2.xsd">

 <Header copyright="Copyright (c) 2011 Joseph" description="Linear Regression Model">

  <Extension name="user" value="Joseph" extender="Rattle"/>

  <Application name="Rattle/PMML" version="1.2.26"/>

  <Timestamp>2011-02-28 14:41:54</Timestamp>

 </Header>

 <DataDictionary numberOfFields="13">

  <DataField name="audit.train$TARGET_Adjusted" optype="continuous" dataType="double"/>

  <DataField name="ID" optype="continuous" dataType="double"/>

  <DataField name="Age" optype="continuous" dataType="double"/>

  <DataField name="Employment" optype="categorical" dataType="string">

   <Value value="Consultant"/>

   <Value value="Private"/>

   <Value value="PSFederal"/>

   <Value value="PSLocal"/>

   <Value value="PSState"/>

   <Value value="SelfEmp"/>

Once the PMML file is built it can be submitted to the ADAPA engine and used to score a new data set.

The interactive demo on the Revolution site pulls all of this together and exercises the key moving parts that would be present in a production level scoring application.

Follow these steps to walk through the demo:

  1. Click on the link appropriate link in the Example: Audit Data section to download the file audit_scoring.csv to your disk.
  2. In the Build Predictive model box on the left:
    1. Select a name for the model
    2. Choose a Data set (You only have one choice: Audit Data).
    3. Select a model technique.
    4. Select the explanatory variables for your model.
    5. Press the Train Model button
  3. In the Evaluate Performance box on the right, press the Deploy Model button to have RevoDeployR send the PMML code over to the ADAPA engine.
  4. In the CSV Batch Scoring box:

    1. Select your model.
    2. Upload the audit_scoring.csv file (or any other file that you may have which would be appropriate for the model you just built)
    3. Watch for the results.

Revolution Analytics: Using ADAPA & Revolution R Enterprise—Audit Data Demo

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.