[This article was first published on Software for Exploratory Data Analysis and Statistical Modelling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a previous post on classification trees we considered using the tree package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base R distribution.

A classification tree can be fitted using the rpart function using a similar syntax to the tree function. For the ecoli data set discussed in the previous post we would use:

> require(rpart)
> ecoli.df = read.csv("ecoli.txt")

followed by

> ecoli.rpart1 = rpart(class ~ mcv + gvh + lip + chg + aac + alm1 + alm2,
data = ecoli.df)

We would then consider whether the tree could be simplified by pruning and make use of the plotcp function:

> plotcp(ecoli.rpart1)

Once the amount of pruning has been determined from this graph or by looking at the output from the printcp function:

> printcp(ecoli.rpart1)

Classification tree:
rpart(formula = class ~ mcv + gvh + lip + chg + aac + alm1 +
alm2, data = ecoli.df)

Variables actually used in tree construction:
 aac  alm1 gvh  mcv

Root node error: 193/336 = 0.5744

n= 336

CP nsplit rel error  xerror     xstd
1 0.388601      0   1.00000 1.00000 0.046959
2 0.207254      1   0.61140 0.61658 0.045423
3 0.062176      2   0.40415 0.45596 0.041758
4 0.051813      3   0.34197 0.38342 0.039359
5 0.031088      4   0.29016 0.36269 0.038571
6 0.015544      5   0.25907 0.30570 0.036136
7 0.010000      6   0.24352 0.31088 0.036375

The prune function is used to simplify the tree based on a cp identified from the graph or printed output threshold.

> ecoli.rpart2 = prune(ecoli.rpart1, cp = 0.02)

The classification tree can be visualised with the plot function and then the text function adds labels to the graph:

> plot(ecoli.rpart2, uniform = TRUE)
> text(ecoli.rpart2, use.n = TRUE, cex = 0.75)

Other useful resources are provided on the Supplementary Material page.