Big-data Naive Bayes and Classification Trees with R and Netezza

March 8, 2012

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The IBM Netezza analytics appliances combine high-capacity storage for Big Data with a massively-parallel processing platform for high-performance computing. With the addition of Revolution R Enterprise for IBM Netezza, you can use the power of the R language to build predictive models on Big Data.

In the demonstration below, Revolution Analytics' Derek Norton analyzes loan approval data stored on the IBM appliance. You'll see the R code used to:

  • Explore the raw data (with summary statistics and charts)
  • Prepare the data for statistical analysis, and create training and test sets
  • Create predictive models using classificiation trees and Naïve Bayes
  • Predict using the models, and evaluate model performance using confusion matrices


Note that while R code is being run on Derek's laptop, the raw data is never moved from the appliance, and the analytic computations take place "in-database" within the appliance itself (where the Revolution R Enterprise engine is also running on each parallel core). 

This demo was included in the recent webinar, Turbo-Charge Your Analytics with IBM Netezza for which you can find slides and a replay at the link below.

Revolution Anlaytics Webinars: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)