Growing some Trees

March 18, 2015
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features),

> MYOCARDE=read.table(
+ "http://freakonometrics.free.fr/saporta.csv",
+ header=TRUE,sep=";")

The default classification tree is

> arbre = rpart(factor(PRONO)~.,data=MYOCARDE)
> rpart.plot(arbre,type=4,extra=6)

We can change the options here, such as the minimum number of observations, per node

> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+       control=rpart.control(minsplit=10))
> rpart.plot(arbre,type=4,extra=6)

or

> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+        control=rpart.control(minsplit=5))
> rpart.plot(arbre,type=4,extra=6)

To visualize that classification, use the following code (to get a projection on the first two components)

> library(FactoMineR) # ACP (sur les var continues)
> X=MYOCARDE[,1:7]
> acp=PCA(X,ncp=ncol(X))
> M=acp$var$coord
> m=apply(X,2,mean)
> s=apply(X,2,sd)
> 
> arbre = rpart(factor(PRONO)~.,data=MYOCARDE)
> pred2=function(d1,d2,Mat,tree){
+   z=Mat %*% c(d1,d2,rep(0,ncol(X)-2))
+   newd=data.frame(t(z*s+m))
+   names(newd)=names(X)
+   predict(tree,newdata=newd,
+           type="prob")[2] }
> p=function(d1,d2) pred2(d1,d2,Minv,arbre)

> Outer <- function(x,y,fun) {
+   mat <- matrix(NA, length(x), length(y))
+   for (i in seq_along(x)) {
+     for (j in seq_along(y)) 
+       mat[i,j]=fun(x[i],y[j])}
+   return(mat)}

> xgrid=seq(-5,5,length=251)
> ygrid=seq(-5,5,length=251)
> zgrid=Outer(xgrid,ygrid,p)
> bluereds=c(
+   rgb(1,0,0,(10:0)/25),rgb(0,0,1,(0:10)/25))

> acp2=PCA(MYOCARDE,quali.sup=8,graph=TRUE)
> plot(acp2, habillage = 8,col.hab=c("red","blue"))
> image(xgrid,ygrid,zgrid,add=TRUE,col=bluereds)
> contour(xgrid,ygrid,zgrid,add=TRUE,levels=.5)

It is also possible to consider the case where

> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+        control=rpart.control(minsplit=5))

Finaly, one can also grow more trees, obtained by sampling. This is the idea of bagging: we boostrap our observations, we grow some trees, and then, we aggregate the predicted values. On the grid

> xgrid=seq(-5,5,length=201)
> ygrid=seq(-5,5,length=201)

the code is the following,

> Z = matrix(0,201,201)
> for(i in 1:200){
+ indice = sample(1:nrow(MYOCARDE),
+          size=nrow(MYOCARDE),
+          replace=TRUE)
+ ECHANTILLON=MYOCARDE[indice,]
+ arbre_b = rpart(factor(PRONO)~.,
+   data=ECHANTILLON)
+ p2 = function(d1,d2) pred2(d1,d2, Minv,arbre_b)
+ zgrid2_b = Outer(xgrid,ygrid,p2)
+ Z = Z+zgrid2_b }
> Zgrid = Z/200

To visualize it, use

> plot(acp2, habillage = 8,
+ col.hab=c("red","blue"))
> image(xgrid,ygrid,Zgrid,add=TRUE,
+ col=bluereds)

> contour(xgrid,ygrid,Zgrid,add=TRUE,
+ levels=.5,lwd=3)

Last, but not least, it is possible to use some random forrest algorithm. The method combines Breiman’s bagging idea (mentioned previously) and the random selection of features.

> library(randomForest)
> foret = randomForest(factor(PRONO)~.,
+          data=MYOCARDE)
> pF=function(d1,d2) pred2(d1,d2,Minv,foret)
> zgridF=Outer(xgrid,ygrid,pF)
 
> acp2=PCA(MYOCARDE,quali.sup=8,graph=TRUE)
> plot(acp2, habillage = 8,col.hab=c("red","blue"))
> image(xgrid,ygrid,Zgrid,add=TRUE,
+ col=bluereds)
> contour(xgrid,ygrid,zgridF,
+ add=TRUE,levels=.5,lwd=3)

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)