The caret package for R provides a variety of error metrics for regression models and 2-class classification models, but only calculates Accuracy and Kappa for multi-class models. Therefore, I wrote the following function to allow caret:::train to calculate a wide variety of error metrics for multi-class problems:

This function was prompted by

a question on cross-validated, asking what the optimal value of k is for a knn model fit to the iris dataset. I wanted to look at statistics besides accuracy and kappa, so I wrote a wrapper function for

caret:::confusionMatrix and auc and logLoss from the Metric packages. Use the following code to fit a knn model to the iris dataset, aggregate all of the metrics, and save a plot for each metric to a pdf file:

This demonstrates that, depending on what metric you use, you will end up with a different model. For example, Accuracy seems to peak around 17:

While AUC and logLoss seem to peak around 6:

You can also increase the number of cross-validation repeats, or use a different method of re-sampling, such as bootstrap re-sampling.

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** Modern Toolmaking**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...

**Tags:** caret, error metrics, Kaggle, predictive modeling, R