ConfusionTableR package has a new function

Posted on April 6, 2021 by Gary Hutson in R bloggers | 0 Comments

[This article was first published on R Blogs – Hutsons-hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The ConfusionTableR package has a new function. Welcome to var_impeR which takes a trained caret R model and produces a tibble and a supporting variable importance plot.

How to use the new var_impeR function

The code following shows how to use the new function:

Training a CARET model

The following steps were used on the NHSRDatasets package to train a machine learning model on our dataset:

library(magrittr)
library(dplyr)
library(caret)
library(tibble)
library(ggplot2)
library(forcats)
library(NHSRdatasets)

#Load in stranded dataset from NHSRDatasets
strand <- NHSRdatasets::stranded_data %>%
  na.omit() %>%
  select(-c('frailty_index', 'admit_date')) %>%
  mutate(stranded_class = make.names(as.factor(stranded.label))) %>%
  select(-stranded.label)

dataset <- strand


# Perform a simple test / train split on the data

train_split_idx <- caret::createDataPartition(dataset$stranded_class, p = 0.75, list = FALSE)
data_TRAIN <- dataset[train_split_idx, ]
data_TEST <- dataset[-train_split_idx, ]
dim(data_TRAIN)
dim(data_TEST)

# Set the model metrics to accuracy and train a random forest model
eval_metric <- "Accuracy"
set.seed(123) # Random seed to make the results reproducible
rf_mod <- caret::train(stranded_class ~ .,
                       data = data_TRAIN,
                       method = "rf",
                       metric = eval_metric)

The code:

Loads in the NHSRDatasets stranded_data ML classification set
Splits the ML model by a training and test split
Monitors the model accuracy
Train a random forest model on our classification data

Time for the Variable Importance with the var_impeR function

Now, once we have the model trained we simply pass the model through the var_impeR function, available in the ConfusionTableR package:

# install.packages("remotes") # if not already installed
remotes::install_github("https://github.com/StatsGary/ConfusionTableR")
library(ConfusionTableR)
# Use the function

ConfusionTableR::var_impeR(rf_mod)

The resultant output is hereunder:

Variable Importance Tibble

This shows how strong the model metrics are against whether a person is a stranded patient.

Variable Importance Plot

The variable importance plot is as below:

Conclusion

To learn more about the ConfusionTableR package - see the vignette to help with flattening confusion matrix table outputs ready for importing into databases.

To leave a comment for the author, please follow the link and comment on their blog: R Blogs – Hutsons-hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

ConfusionTableR package has a new function