# Classification Trees using the rpart function

September 21, 2010
By

(This article was first published on Software for Exploratory Data Analysis and Statistical Modelling, and kindly contributed to R-bloggers)

In a previous post on classification trees we considered using the tree package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base R distribution.

Fast Tube by Casper

A classification tree can be fitted using the rpart function using a similar syntax to the tree function. For the ecoli data set discussed in the previous post we would use:

``` > require(rpart) > ecoli.df = read.csv("ecoli.txt") ```

followed by

``` > ecoli.rpart1 = rpart(class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df) ```

We would then consider whether the tree could be simplified by pruning and make use of the plotcp function:

``` > plotcp(ecoli.rpart1) ```

Once the amount of pruning has been determined from this graph or by looking at the output from the printcp function:

``` > printcp(ecoli.rpart1)   Classification tree: rpart(formula = class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df)   Variables actually used in tree construction: [1] aac alm1 gvh mcv   Root node error: 193/336 = 0.5744   n= 336   CP nsplit rel error xerror xstd 1 0.388601 0 1.00000 1.00000 0.046959 2 0.207254 1 0.61140 0.61658 0.045423 3 0.062176 2 0.40415 0.45596 0.041758 4 0.051813 3 0.34197 0.38342 0.039359 5 0.031088 4 0.29016 0.36269 0.038571 6 0.015544 5 0.25907 0.30570 0.036136 7 0.010000 6 0.24352 0.31088 0.036375 ```

The prune function is used to simplify the tree based on a cp identified from the graph or printed output threshold.

``` > ecoli.rpart2 = prune(ecoli.rpart1, cp = 0.02) ```

The classification tree can be visualised with the plot function and then the text function adds labels to the graph:

``` > plot(ecoli.rpart2, uniform = TRUE) > text(ecoli.rpart2, use.n = TRUE, cex = 0.75) ```

Other useful resources are provided on the Supplementary Material page.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...