`crossvalidation` on R-universe, plus a classification example
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I had to rename my R package crossval
– generic functions for cross-validation – to crossvalidation
, because its name was clashing with an existing CRAN R package’s named crossval
.
Here is how to install
crossvalidation
:
options(repos = c( techtonique = 'https://techtonique.r-universe.dev', CRAN = 'https://cloud.r-project.org')) install.packages("crossvalidation")
What is the R-universe mentioned in the previous code snippet? It is, IMHO, a quite promising CRAN-like repository for storing, sharing and building R packages (for Linux, macOS and Windows). If you want to create your own repository on R-universe, read this.
I’ve been looking
for such an infrastructure for some time, and tried miniCRAN
in particular.
Unfortunately on miniCRAN (which works pretty well for CRAN packages), I haven’t been able, so far, to upload/build local packages – local meaning non-CRAN packages. Maybe I missed a point on miniCRAN
’s use, so if you know how to do that, please reach out to me (even though I’ll continue to follow R-universe’s development)!
Examples of use of crossvalidation
for regression and univariate time series can be found through the following links (hence, you must replace crossval
occurences by crossvalidation
):
- Grid search cross-validation using crossval
- Linear model, xgboost and randomForest cross-validation using crossval::crossval_ml
- Custom errors for cross-validation using crossval::crossval_ml
- Time series cross-validation using crossval
For classification, an example is presented below.
Example of use of crossvalidation
for classification
# Import libraries library(crossvalidation) library(randomForest) # Input data # Transforming model response into a factor y <- as.factor(as.numeric(iris$Species)) # Explanatory variables X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]) # 5-fold cross-validation repeated 3 times # default error metric, when y is a factor: accuracy crossvalidation::crossval_ml(x = X, y = y, k = 5, repeats = 3, fit_func = randomForest::randomForest, predict_func = predict, fit_params = list(mtry = 2), packages = "randomForest") ## $folds ## repeat_1 repeat_2 repeat_3 ## fold_1 0.9666667 0.9666667 1.0000000 ## fold_2 0.9666667 0.9000000 0.9333333 ## fold_3 1.0000000 0.9666667 0.9333333 ## fold_4 0.9333333 1.0000000 0.9333333 ## fold_5 0.9333333 0.9333333 0.9666667 ## ## $mean ## [1] 0.9555556 ## ## $sd ## [1] 0.02999118 ## ## $median ## [1] 0.9666667 # We can specify custom error metrics for crossvalidation::crossval_ml # here, the error rate eval_metric <- function (preds, actual) { stopifnot(length(preds) == length(actual)) res <- 1-mean(preds == actual) names(res) <- "error rate" return(res) } # specify `eval_metric` argument for measuring the error rate # instead of the (default) accuracy crossvalidation::crossval_ml(x = X, y = y, k = 5, repeats = 3, fit_func = randomForest::randomForest, predict_func = predict, fit_params = list(mtry = 2), packages = "randomForest", eval_metric=eval_metric) ## $folds ## repeat_1 repeat_2 repeat_3 ## fold_1 0.03333333 0.03333333 0.00000000 ## fold_2 0.03333333 0.10000000 0.06666667 ## fold_3 0.00000000 0.03333333 0.06666667 ## fold_4 0.06666667 0.00000000 0.06666667 ## fold_5 0.06666667 0.06666667 0.03333333 ## ## $mean ## [1] 0.04444444 ## ## $sd ## [1] 0.02999118 ## ## $median ## [1] 0.03333333
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.