# Growing some Trees

March 18, 2015
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features),

```> MYOCARDE=read.table(
+ "http://freakonometrics.free.fr/saporta.csv",

The default classification tree is

```> arbre = rpart(factor(PRONO)~.,data=MYOCARDE)
> rpart.plot(arbre,type=4,extra=6)```

We can change the options here, such as the minimum number of observations, per node

```> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+       control=rpart.control(minsplit=10))
> rpart.plot(arbre,type=4,extra=6)```

or

```> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+        control=rpart.control(minsplit=5))
> rpart.plot(arbre,type=4,extra=6)```

To visualize that classification, use the following code (to get a projection on the first two components)

```> library(FactoMineR) # ACP (sur les var continues)
> X=MYOCARDE[,1:7]
> acp=PCA(X,ncp=ncol(X))
> M=acp\$var\$coord
> m=apply(X,2,mean)
> s=apply(X,2,sd)
>
> arbre = rpart(factor(PRONO)~.,data=MYOCARDE)
> pred2=function(d1,d2,Mat,tree){
+   z=Mat %*% c(d1,d2,rep(0,ncol(X)-2))
+   newd=data.frame(t(z*s+m))
+   names(newd)=names(X)
+   predict(tree,newdata=newd,
+           type="prob")[2] }
> p=function(d1,d2) pred2(d1,d2,Minv,arbre)

> Outer <- function(x,y,fun) {
+   mat <- matrix(NA, length(x), length(y))
+   for (i in seq_along(x)) {
+     for (j in seq_along(y))
+       mat[i,j]=fun(x[i],y[j])}
+   return(mat)}

> xgrid=seq(-5,5,length=251)
> ygrid=seq(-5,5,length=251)
> zgrid=Outer(xgrid,ygrid,p)
> bluereds=c(
+   rgb(1,0,0,(10:0)/25),rgb(0,0,1,(0:10)/25))

> acp2=PCA(MYOCARDE,quali.sup=8,graph=TRUE)
> plot(acp2, habillage = 8,col.hab=c("red","blue"))

It is also possible to consider the case where

```> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+        control=rpart.control(minsplit=5))```

Finaly, one can also grow more trees, obtained by sampling. This is the idea of bagging: we boostrap our observations, we grow some trees, and then, we aggregate the predicted values. On the grid

```> xgrid=seq(-5,5,length=201)
> ygrid=seq(-5,5,length=201)```

the code is the following,

```> Z = matrix(0,201,201)
> for(i in 1:200){
+ indice = sample(1:nrow(MYOCARDE),
+          size=nrow(MYOCARDE),
+          replace=TRUE)
+ ECHANTILLON=MYOCARDE[indice,]
+ arbre_b = rpart(factor(PRONO)~.,
+   data=ECHANTILLON)
+ p2 = function(d1,d2) pred2(d1,d2, Minv,arbre_b)
+ zgrid2_b = Outer(xgrid,ygrid,p2)
+ Z = Z+zgrid2_b }
> Zgrid = Z/200```

To visualize it, use

```> plot(acp2, habillage = 8,
+ col.hab=c("red","blue"))
+ col=bluereds)```

```> contour(xgrid,ygrid,Zgrid,add=TRUE,
+ levels=.5,lwd=3)```

Last, but not least, it is possible to use some random forrest algorithm. The method combines Breiman’s bagging idea (mentioned previously) and the random selection of features.

```> library(randomForest)
> foret = randomForest(factor(PRONO)~.,
+          data=MYOCARDE)
> pF=function(d1,d2) pred2(d1,d2,Minv,foret)
> zgridF=Outer(xgrid,ygrid,pF)

> acp2=PCA(MYOCARDE,quali.sup=8,graph=TRUE)
> plot(acp2, habillage = 8,col.hab=c("red","blue"))
+ col=bluereds)
> contour(xgrid,ygrid,zgridF,

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...