In a previous post on classification trees we considered using the tree package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base R distribution.

Fast Tube by Casper
A classification tree can be fitted using the rpart function using a similar syntax to the tree function. For the ecoli data set discussed in the previous post we would use:
> require(rpart)
> ecoli.df = read.csv("ecoli.txt")followed by
> ecoli.rpart1 = rpart(class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df)
We would then consider whether the tree could be simplified by pruning and make use of the plotcp function:
> plotcp(ecoli.rpart1)
Once the amount of pruning has been determined from this graph or by looking at the output from the printcp function:
> printcp(ecoli.rpart1)
Classification tree:
rpart(formula = class ~ mcv + gvh + lip + chg + aac + alm1 +
alm2, data = ecoli.df)
Variables actually used in tree construction:
[1] aac alm1 gvh mcv
Root node error: 193/336 = 0.5744
n= 336
CP nsplit rel error xerror xstd
1 0.388601 0 1.00000 1.00000 0.046959
2 0.207254 1 0.61140 0.61658 0.045423
3 0.062176 2 0.40415 0.45596 0.041758
4 0.051813 3 0.34197 0.38342 0.039359
5 0.031088 4 0.29016 0.36269 0.038571
6 0.015544 5 0.25907 0.30570 0.036136
7 0.010000 6 0.24352 0.31088 0.036375The prune function is used to simplify the tree based on a cp identified from the graph or printed output threshold.
> ecoli.rpart2 = prune(ecoli.rpart1, cp = 0.02)
The classification tree can be visualised with the plot function and then the text function adds labels to the graph:
> plot(ecoli.rpart2, uniform = TRUE) > text(ecoli.rpart2, use.n = TRUE, cex = 0.75)
Other useful resources are provided on the Supplementary Material page.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).