# 用gbm包来提升决策树能力

 This post was kindly contributed by 数据科学与R语言 - go there to comment and to read the full post.

• 损失函数的形式(distribution)
• 迭代次数(n.trees)
• 学习速率(shrinkage)
• 再抽样比率(bag.fraction)
• 决策树的深度(interaction.depth)

`# 加载包和数据library(gbm)data(PimaIndiansDiabetes2,package='mlbench')# 将响应变量转为0-1格式data <- PimaIndiansDiabetes2data\$diabetes <- as.numeric(data\$diabetes)data <- transform(data,diabetes=diabetes-1)# 使用gbm函数建模model <- gbm(diabetes~.,data=data,shrinkage=0.01,             distribution='bernoulli',cv.folds=5,             n.trees=3000,verbose=F)# 用交叉检验确定最佳迭代次数best.iter <- gbm.perf(model,method='cv')`
`# 观察各解释变量的重要程度summary(model,best.iter)`
`# 变量的边际效应plot.gbm(model,1,best.iter)`
`# 用caret包观察预测精度library(caret)data <- PimaIndiansDiabetes2fitControl <- trainControl(method = "cv", number = 5,returnResamp = "all")model2 <- train(diabetes~., data=data,method='gbm',distribution='bernoulli',trControl = fitControl,verbose=F,tuneGrid = data.frame(.n.trees=best.iter,.shrinkage=0.01,.interaction.depth=1))model2`

Accuracy  Kappa  Accuracy SD  Kappa SD
0.78      0.504  0.0357       0.0702

http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf