Open source tools provide a cost-effective, yet powerful option for data mining. The following contenders adhere to the PMML standard which facilitates model exchange among open source and commercial vendors, providing a definitive route for production deployment of predictive models.
The R Project
The R Project for Statistical Computing is definitely the most used and revered statistical package among advocates of open-source and community computing projects. Like the iPhone app store, you can basically find anything you need in CRAN (statistical that’s to say … yep, no navigation system for R), the Comprehensive R Archive Network. It is in CRAN that you will find the R PMML Package. This package allows R users to export PMML for a variety of models, including decision trees and neural networks (among many others). We recently co-authored an article with Graham Williams, the original author and maintainer of the package. It can be downloaded directly from The R Journal website. If you are interested in contributing code for the package, please contact us.
Developed by the University of Konstanz, KNIME is an open-source platform that enables users to visually create and execute data flows. Since KNIME 2.0 (available as of December 2008), users can import and export PMML models into and out of KNIME. Given that users can use R within KNIME, the R PMML package can also be used to export and convert R models to PMML within KNIME. New versions of KNIME will most certainly expand its support for PMML even further.
Developed by the University of Waikato, Weka provides a large collection of machine learning algorithms for solving data mining problems. Although Weka has currently no export functionality for PMML, Mark Hall is currently working on implementing import functionally for PMML. Weka can already import models such as regression, decision trees and neural networks. PMML support in Weka is constantly expanding with the addition of transformations and built-in functions.
Most recently, Rapid-I announced that it will extend the latest version of its RapidMiner software to include support for PMML. RapidMiner, formerly known as YALE, is an open-source platform that offers operators for all aspects of data mining. As with KNIME, Rapid-I is one of the latest companies to join the rankings of the Data Mining Group (DMG) beside companies like IBM, Microstrategy, SPSS, SAS and Zementis. The DMG is already busy at work refining and adding yet more capabilities and power to PMML.
PMML Discussion Forums
For an on-going discussion and to read about the latest PMML news, we would like to invite you to join the PMML group in LinkedIn or the discussion forum in the PMML group on Analytic Bridge, a social network community for analytics professionals. For PMML resources, examples, and useful links, please take a look at the PMML page on the Zementis website.