**Software for Exploratory Data Analysis and Statistical Modelling**, and kindly contributed to R-bloggers)

In a previous post on classification trees we considered using the **tree** package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base **R** distribution.

Fast Tube by Casper

A classification tree can be fitted using the **rpart** function using a similar syntax to the **tree** function. For the ecoli data set discussed in the previous post we would use:

> require(rpart) > ecoli.df = read.csv("ecoli.txt")

followed by

> ecoli.rpart1 = rpart(class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df)

We would then consider whether the tree could be simplified by pruning and make use of the **plotcp** function:

> plotcp(ecoli.rpart1)

Once the amount of pruning has been determined from this graph or by looking at the output from the **printcp** function:

> printcp(ecoli.rpart1) Classification tree: rpart(formula = class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, data = ecoli.df) Variables actually used in tree construction: [1] aac alm1 gvh mcv Root node error: 193/336 = 0.5744 n= 336 CP nsplit rel error xerror xstd 1 0.388601 0 1.00000 1.00000 0.046959 2 0.207254 1 0.61140 0.61658 0.045423 3 0.062176 2 0.40415 0.45596 0.041758 4 0.051813 3 0.34197 0.38342 0.039359 5 0.031088 4 0.29016 0.36269 0.038571 6 0.015544 5 0.25907 0.30570 0.036136 7 0.010000 6 0.24352 0.31088 0.036375

The **prune** function is used to simplify the tree based on a *cp* identified from the graph or printed output threshold.

> ecoli.rpart2 = prune(ecoli.rpart1, cp = 0.02)

The classification tree can be visualised with the plot function and then the text function adds labels to the graph:

> plot(ecoli.rpart2, uniform = TRUE) > text(ecoli.rpart2, use.n = TRUE, cex = 0.75)

Other useful resources are provided on the Supplementary Material page.

**leave a comment**for the author, please follow the link and comment on his blog:

**Software for Exploratory Data Analysis and Statistical Modelling**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...