ConfusionTableR package has a new function
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The ConfusionTableR package has a new function. Welcome to var_impeR which takes a trained caret R model and produces a tibble and a supporting variable importance plot.
How to use the new var_impeR function
The code following shows how to use the new function:
Training a CARET model
The following steps were used on the NHSRDatasets package to train a machine learning model on our dataset:
library(magrittr) library(dplyr) library(caret) library(tibble) library(ggplot2) library(forcats) library(NHSRdatasets) #Load in stranded dataset from NHSRDatasets strand <- NHSRdatasets::stranded_data %>% na.omit() %>% select(-c('frailty_index', 'admit_date')) %>% mutate(stranded_class = make.names(as.factor(stranded.label))) %>% select(-stranded.label) dataset <- strand # Perform a simple test / train split on the data train_split_idx <- caret::createDataPartition(dataset$stranded_class, p = 0.75, list = FALSE) data_TRAIN <- dataset[train_split_idx, ] data_TEST <- dataset[-train_split_idx, ] dim(data_TRAIN) dim(data_TEST) # Set the model metrics to accuracy and train a random forest model eval_metric <- "Accuracy" set.seed(123) # Random seed to make the results reproducible rf_mod <- caret::train(stranded_class ~ ., data = data_TRAIN, method = "rf", metric = eval_metric)
The code:
- Loads in the NHSRDatasets stranded_data ML classification set
- Splits the ML model by a training and test split
- Monitors the model accuracy
- Train a random forest model on our classification data
Time for the Variable Importance with the var_impeR function
Now, once we have the model trained we simply pass the model through the var_impeR function, available in the ConfusionTableR package:
# install.packages("remotes") # if not already installed remotes::install_github("https://github.com/StatsGary/ConfusionTableR") library(ConfusionTableR) # Use the function ConfusionTableR::var_impeR(rf_mod)
The resultant output is hereunder:
Variable Importance Tibble
This shows how strong the model metrics are against whether a person is a stranded patient.
Variable Importance Plot
The variable importance plot is as below:
Conclusion
To learn more about the ConfusionTableR package - see the vignette to help with flattening confusion matrix table outputs ready for importing into databases.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.