This afternoon I went to Max Kuhn’s tutorial on his caret package. caret stands for classification and regression (something beginning with e) trees. It provides a consistent interface to nearly 150 different models in R, in much the same way as the plyr package provides a consistent interface to the apply functions.

The basic usage of caret is to split your data into training and test sets.

my_data <- split(my_data, runif(nrow(my_data)) > p) #for some value of p
names(my_data) <- c("training", "testing")

Then call `train`

on your training set.

training_model <- train(
response ~ .,
data = my_data$training,
method = "a type of model!")

Then predict it using `predict`

.

predictions <- predict(training_model, my_data$testing)

So the basic usage is very simple. The devil is of course in the statistical details. You still have to choose (at least one) type of model, and there are many options for how those models should be fit.

Max suggested that a good strategy for modelling is to begin with a powerful black box method (boosting, random forests or support vector machines) since they can usually provide excellent fits. The next step is to use a simple, understandable model (some form of regression perhaps) and see how much predictive power you are losing.

I suspect that in order to get the full benefit of caret, I’ll need to read Max’s book: Applied Predictive Modeling.

Tagged: caret, modelling, r, statistics

*Related*

To

**leave a comment** for the author, please follow the link and comment on his blog:

** 4D Pie Charts » R**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as: visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...