streaming machine learning with RMOA: stream_in > train > predict

(This article was first published on BNOSAC - Belgium Network of Open Source Analytical Consultants, and kindly contributed to R-bloggers)

We will be showcasing our RMOA package at the next R User conference in Aalborg.
For the R users who are unfamiliar with streaming modelling and want to be ahead of the Gartner Hype cycle or want to evaluate existing streaming machine learning models, RMOA allows to build, run and evaluate streaming classification models which are built in MOA (Massive Online Learning).
For an introduction to RMOA and MOA and the type of machine learning models which are possible in MOA – see our previous blog post or scroll through our blog page.

In this example below, we showcase the RMOA package by using streaming JSON data which can come from whatever noSQL database that spits out json. For this example, package jsonlite provides a nice stream_in function (an example is shown here) which handles streaming json data. Plugging in streaming machine learning models with RMOA is a breeze.

Let's dive into the R code immediately where we show how to run, build and evaluate a streaming boosted classification model.

require(jsonlite)
require(data.table)
require(RMOA)
require(ROCR)
##
## Use a dataset from Jeroen Ooms available at jeroenooms.github.io/data/diamonds.json
##
myjsondataset <- url("http://jeroenooms.github.io/data/diamonds.json")
datatransfo <- function(x){
  ## Setting the target to predict
  x$target <- factor(ifelse(x$cut == "Very Good", "Very Good", "Other"), levels = c("Very Good", "Other"))
  ## Making sure the levels are the same across all streaming chunks
  x$color <- factor(x$color, levels = c("D", "E", "F", "G", "H", "I", "J"))
  x  
}

##
## Read 100 lines of an example dataset to see how it looks like
##
x <- readLines(myjsondataset, n = 100, encoding = "UTF-8")
x <- rbindlist(lapply(x, fromJSON))
x <- datatransfo(x)
str(x)

######################################
## Boosted streaming classification
##   - set up the boosting options
######################################
ctrl <- MOAoptions(model = "OCBoost", randomSeed = 123456789, ensembleSize = 25,
                   smoothingParameter = 0.5)
mymodel <- OCBoost(control = ctrl)
mymodel
## Train an initial model on 100 rows of the data
myboostedclassifier <- trainMOA(model = mymodel, 
         formula = target ~ color + depth + x + y + z,
         data = datastream_dataframe(x))

## Update the model iteratively with streaming data
stream_in(
  con = myjsondataset,
  handler = function(x){
    x <- datatransfo(x)
    ## Update the trained model with the new chunks
    myboostedclassifier <- trainMOA(model = myboostedclassifier$model, 
             formula = target ~ color + depth + x + y + z,
             data = datastream_dataframe(x), 
             reset = FALSE) ## do not reset what the model has learned already
  },
  pagesize = 500)

## Do some prediction to test the model
predict(myboostedclassifier, x)
table(sprintf("Reality: %s", x$target),
      sprintf("Predicted: %s", predict(myboostedclassifier, x)))

## Do a streaming prediction
stream_in(con = myjsondataset,
          handler = function(x){
            x <- datatransfo(x)
            myprediction <- predict(myboostedclassifier, x)
            ## Basic evaluation by extracting accuracy
            print(round(sum(myprediction == x$target) / length(myprediction), 2))
          },
          pagesize = 100)



For more information on RMOA or streaming modelling, get into contact.

To leave a comment for the author, please follow the link and comment on their blog: BNOSAC - Belgium Network of Open Source Analytical Consultants.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)