# Classification Trees using the rpart function

September 21, 2010
By

(This article was first published on Software for Exploratory Data Analysis and Statistical Modelling, and kindly contributed to R-bloggers)

In a previous post on classification trees we considered using the tree package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base R distribution.

A classification tree can be fitted using the rpart function using a similar syntax to the tree function. For the ecoli data set discussed in the previous post we would use:

``` > require(rpart) > ecoli.df = read.csv("ecoli.txt") ```

followed by

``` > ecoli.rpart1 = rpart(class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df) ```

We would then consider whether the tree could be simplified by pruning and make use of the plotcp function:

``` > plotcp(ecoli.rpart1) ```

Once the amount of pruning has been determined from this graph or by looking at the output from the printcp function:

``` > printcp(ecoli.rpart1)   Classification tree: rpart(formula = class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df)   Variables actually used in tree construction:  aac alm1 gvh mcv   Root node error: 193/336 = 0.5744   n= 336   CP nsplit rel error xerror xstd 1 0.388601 0 1.00000 1.00000 0.046959 2 0.207254 1 0.61140 0.61658 0.045423 3 0.062176 2 0.40415 0.45596 0.041758 4 0.051813 3 0.34197 0.38342 0.039359 5 0.031088 4 0.29016 0.36269 0.038571 6 0.015544 5 0.25907 0.30570 0.036136 7 0.010000 6 0.24352 0.31088 0.036375 ```

The prune function is used to simplify the tree based on a cp identified from the graph or printed output threshold.

``` > ecoli.rpart2 = prune(ecoli.rpart1, cp = 0.02) ```

The classification tree can be visualised with the plot function and then the text function adds labels to the graph:

``` > plot(ecoli.rpart2, uniform = TRUE) > text(ecoli.rpart2, use.n = TRUE, cex = 0.75) ```

Other useful resources are provided on the Supplementary Material page.

To leave a comment for the author, please follow the link and comment on their blog: Software for Exploratory Data Analysis and Statistical Modelling.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , ,