user2013: The caret tutorial

July 9, 2013

(This article was first published on 4D Pie Charts » R, and kindly contributed to R-bloggers)

This afternoon I went to Max Kuhn’s tutorial on his caret package. caret stands for classification and regression (something beginning with e) trees. It provides a consistent interface to nearly 150 different models in R, in much the same way as the plyr package provides a consistent interface to the apply functions.

The basic usage of caret is to split your data into training and test sets.

my_data <- split(my_data, runif(nrow(my_data)) > p) #for some value of p
names(my_data) <- c("training", "testing")

Then call train on your training set.

training_model <- train(
  response ~ ., 
  data   = my_data$training,
  method = "a type of model!")

Then predict it using predict.

predictions <- predict(training_model, my_data$testing)

So the basic usage is very simple. The devil is of course in the statistical details. You still have to choose (at least one) type of model, and there are many options for how those models should be fit.

Max suggested that a good strategy for modelling is to begin with a powerful black box method (boosting, random forests or support vector machines) since they can usually provide excellent fits. The next step is to use a simple, understandable model (some form of regression perhaps) and see how much predictive power you are losing.

I suspect that in order to get the full benefit of caret, I’ll need to read Max’s book: Applied Predictive Modeling.

Tagged: caret, modelling, r, statistics

To leave a comment for the author, please follow the link and comment on his blog: 4D Pie Charts » R. offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.