PMML and Open Source Data Mining

November 6, 2009
By

(This article was first published on Predictive Analytics, Big Data, Hadoop, PMML, and kindly contributed to R-bloggers)

Open source tools provide a cost-effective, yet powerful option for data mining. The following contenders adhere to the PMML standard which facilitates model exchange among open source and commercial vendors, providing a definitive route for production deployment of predictive models.

The R Project
The R Project for Statistical Computing is definitely the most used and revered statistical package among advocates of open-source and community computing projects. Like the iPhone app store, you can basically find anything you need in CRAN (statistical that’s to say … yep, no navigation system for R), the Comprehensive R Archive Network. It is in CRAN that you will find the R PMML Package. This package allows R users to export PMML for a variety of models, including decision trees and neural networks (among many others). We recently co-authored an article with Graham Williams, the original author and maintainer of the package. It can be downloaded directly from The R Journal website. If you are interested in contributing code for the package, please contact us.
KNIME
Developed by the University of Konstanz, KNIME is an open-source platform that enables users to visually create and execute data flows. Since KNIME 2.0 (available as of December 2008), users can import and export PMML models into and out of KNIME. Given that users can use R within KNIME, the R PMML package can also be used to export and convert R models to PMML within KNIME. New versions of KNIME will most certainly expand its support for PMML even further.
Weka
Developed by the University of Waikato, Weka provides a large collection of machine learning algorithms for solving data mining problems. Although Weka has currently no export functionality for PMML, Mark Hall is currently working on implementing import functionally for PMML. Weka can already import models such as regression, decision trees and neural networks. PMML support in Weka is constantly expanding with the addition of transformations and built-in functions.
RapidMiner
Most recently, Rapid-I announced that it will extend the latest version of its RapidMiner software to include support for PMML. RapidMiner, formerly known as YALE, is an open-source platform that offers operators for all aspects of data mining. As with KNIME, Rapid-I is one of the latest companies to join the rankings of the Data Mining Group (DMG) beside companies like IBM, Microstrategy, SPSS, SAS and Zementis. The DMG is already busy at work refining and adding yet more capabilities and power to PMML.
PMML Discussion Forums
For an on-going discussion and to read about the latest PMML news, we would like to invite you to join the PMML group in LinkedIn or the discussion forum in the PMML group on Analytic Bridge, a social network community for analytics professionals. For PMML resources, examples, and useful links, please take a look at the PMML page on the Zementis website.

To leave a comment for the author, please follow the link and comment on their blog: Predictive Analytics, Big Data, Hadoop, PMML.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)