Let’s have a "party" and tear this place "rpart"!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For many problems, classification and regression trees can be a simple and elegant solution, assuming you know their well-documented strengths and weaknesses. I first explored their use several years ago with JMP, which is easy to use. If you do not have JMP Pro, you will not be able to use the more advanced techniques (ensemble methods if you will) like bagging, boosting, random forest etc. I don’t have JMP Pro and with great angst I realized I’m not Mr. Ensemble and need to get with the program. Alas, if you can make it with R, you can make it anywhere.
Before I drive myself mad with bagging and boosting, I wanted to cover the basic methods. It seems through a cursory search of the internet that the R packages “party” and “rpart” are worth learning and evaluating. I applied them to a data set on college football I’ve been compiling from the website cfbstats.com. Keep in mind, the analysis below will reveal no insights on breaking Vegas. It is just a few simple variables cobbled together to learn the packages.
Notice that the split calculations are slightly different. I’m not sure why, but plan to dig into this fact. Also, rpart does not auto optimize the number of splits. Here is how to investigate the matter:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.