In a previous post on classification trees we considered using the tree package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base R distribution.
Fast Tube by Casper
A classification tree can be fitted using the rpart function using a similar syntax to the tree function. For the ecoli data set discussed in the previous post we would use:
> require(rpart) > ecoli.df = read.csv("ecoli.txt")
> ecoli.rpart1 = rpart(class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df)
We would then consider whether the tree could be simplified by pruning and make use of the plotcp function:
Once the amount of pruning has been determined from this graph or by looking at the output from the printcp function:
> printcp(ecoli.rpart1) Classification tree: rpart(formula = class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df) Variables actually used in tree construction:  aac alm1 gvh mcv Root node error: 193/336 = 0.5744 n= 336 CP nsplit rel error xerror xstd 1 0.388601 0 1.00000 1.00000 0.046959 2 0.207254 1 0.61140 0.61658 0.045423 3 0.062176 2 0.40415 0.45596 0.041758 4 0.051813 3 0.34197 0.38342 0.039359 5 0.031088 4 0.29016 0.36269 0.038571 6 0.015544 5 0.25907 0.30570 0.036136 7 0.010000 6 0.24352 0.31088 0.036375
The prune function is used to simplify the tree based on a cp identified from the graph or printed output threshold.
> ecoli.rpart2 = prune(ecoli.rpart1, cp = 0.02)
The classification tree can be visualised with the plot function and then the text function adds labels to the graph:
> plot(ecoli.rpart2, uniform = TRUE) > text(ecoli.rpart2, use.n = TRUE, cex = 0.75)
Other useful resources are provided on the Supplementary Material page.