Evaluating Binary Classification Models Using Gains Tables

R'tichoke

2 years ago

[This article was first published on R'tichoke, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level2">

Introduction

In credit risk modeling and binary classification applications, analysts employ gains tables (also known as KS tables) as a fundamental tool for measuring and quantifying model performance. This tutorial dives into the construction and interpretion of gains tables using R. ## Theoretical Foundation: Understanding Gains Tables

A gains table systematically discretizes the population (typically a validation or test dataset) into groups based on the model’s output predictions (probability scores, log odds, or risk scores). Each group conventionally represents 10% of the total population (deciles), though alternative binning strategies may be employed. The output presents summary statistics for each group and analyzes the cumulative distributions of events (defaults) and non-events to quantify the model’s discriminatory performance.

< section id="package-dependencies" class="level2">

Package Dependencies

# Load required packages
library(dplyr)
library(magrittr)
library(knitr)
library(scales)

< section id="dataset-preparation" class="level2">

Dataset Preparation

This tutorial utilizes a sample from the Lending Club dataset, which contains comprehensive loan information and associated outcomes suitable for credit risk modeling applications.

# Load the sample data
sample <- read.csv("https://bit.ly/42ypcnJ")

# Check dimensions
dim(sample)

[1] 10000   153

< section id="target-definition" class="level2">

Target Definition

The initial step requires the creation of a binary target variable for modeling purposes. In this credit risk application, we identify borrowers who defaulted on their loan obligations.

# Check unique loan statuses
unique(sample$loan_status)

[1] "Fully Paid"                                         
[2] "Current"                                            
[3] "Charged Off"                                        
[4] "Late (31-120 days)"                                 
[5] "Late (16-30 days)"                                  
[6] "In Grace Period"                                    
[7] "Does not meet the credit policy. Status:Fully Paid" 
[8] "Does not meet the credit policy. Status:Charged Off"

# Define "bad" loans as those that are charged off
codes <- c("Charged Off", "Does not meet the credit policy. Status:Charged Off")

# Create a binary flag for defaults
sample %<>% mutate(bad_flag = ifelse(loan_status %in% codes, 1, 0))

# Check overall event rates
sample %>% 
  summarise(events = sum(bad_flag == 1), 
            non_events = sum(bad_flag == 0)) %>% 
  mutate(            event_rate = events/(events + non_events))

  events non_events event_rate
1   1162       8838     0.1162

< section id="model-development" class="level2">

Model Development

Subsequently, we develop a logistic regression model to generate predictions that will serve as the foundation for gains table construction.

# Replace NA values with a default value
sample[is.na(sample)] <- -1

# Clean the data
sample %<>% 
  # Remove cases where home ownership and payment plan are not reported
  filter(!home_ownership %in% c("", "NONE"),
         pymnt_plan != "") %>% 
  # Convert categorical variables to factors
  mutate(home_ownership = factor(home_ownership), 
         pymnt_plan = factor(pymnt_plan))

# Train-test split (70-30)
idx <- sample(1:nrow(sample), size = 0.7 * nrow(sample), replace = FALSE)
train <- sample[idx,]
test <- sample[-idx,]

# Build a logistic regression model
mdl <- glm(
  formula = bad_flag ~ 
    loan_amnt + term + mths_since_last_delinq + total_pymnt + 
    home_ownership + acc_now_delinq + 
    inq_last_6mths + delinq_amnt + 
    mths_since_last_record + mths_since_recent_revol_delinq + 
    mths_since_last_major_derog + mths_since_recent_inq + 
    mths_since_recent_bc + num_accts_ever_120_pd,
  family = "binomial", 
  data = train
)

# Generate predictions on the test set
test$pred <- predict(mdl, newdata = test)

< section id="gains-table-construction" class="level2">

Gains Table Construction

The following section demonstrates the step-by-step construction of a comprehensive gains table through systematic binning and statistical analysis.

< section id="population-discretization-into-bins" class="level3">

Population Discretization into Bins

# Create deciles based on model predictions
q <- quantile(test$pred, probs = seq(0, 1, length.out = 11))

# Add bins to test dataset
test$bins <- cut(test$pred, breaks = q, include.lowest = TRUE, 
                right = TRUE, ordered_result = TRUE)

# Check the bin levels (note they're in increasing order)
levels(test$bins)

 [1] "[-5.11,-3.34]" "(-3.34,-2.89]" "(-2.89,-2.64]" "(-2.64,-2.41]"
 [5] "(-2.41,-2.23]" "(-2.23,-2.03]" "(-2.03,-1.83]" "(-1.83,-1.6]" 
 [9] "(-1.6,-1.26]"  "(-1.26,2.51]"

< section id="basic-statistical-measures-by-segment" class="level3">

Basic Statistical Measures by Segment

# Create initial gains table with counts
gains_table <- test %>% 
  group_by(bins) %>% 
  summarise(total = n(), 
            events = sum(bad_flag == 1), 
            non_events = sum(bad_flag == 0))

# Add event rate column
gains_table %<>%
  mutate(event_rate = percent(events / total, 0.1, 100))

# Display the table
kable(gains_table)

bins	total	events	non_events	event_rate
[-5.11,-3.34]	300	3	297	1.0%
(-3.34,-2.89]	300	8	292	2.7%
(-2.89,-2.64]	300	10	290	3.3%
(-2.64,-2.41]	300	14	286	4.7%
(-2.41,-2.23]	300	26	274	8.7%
(-2.23,-2.03]	300	38	262	12.7%
(-2.03,-1.83]	300	46	254	15.3%
(-1.83,-1.6]	300	49	251	16.3%
(-1.6,-1.26]	300	58	242	19.3%
(-1.26,2.51]	300	74	226	24.7%

< section id="cumulative-distribution" class="level3">

Cumulative Distribution

# Add population percentage and cumulative distributions
gains_table %<>%
  mutate(pop_pct = percent(total/sum(total), 0.1, 100), 
         
         # Calculate cumulative percentages
         c.events_pct = cumsum(events) / sum(events),
         c.non_events_pct = cumsum(non_events) / sum(non_events))

# Display the updated table
kable(gains_table)

bins	total	events	non_events	event_rate	pop_pct	c.events_pct	c.non_events_pct
[-5.11,-3.34]	300	3	297	1.0%	10.0%	0.0092025	0.1110696
(-3.34,-2.89]	300	8	292	2.7%	10.0%	0.0337423	0.2202693
(-2.89,-2.64]	300	10	290	3.3%	10.0%	0.0644172	0.3287210
(-2.64,-2.41]	300	14	286	4.7%	10.0%	0.1073620	0.4356769
(-2.41,-2.23]	300	26	274	8.7%	10.0%	0.1871166	0.5381451
(-2.23,-2.03]	300	38	262	12.7%	10.0%	0.3036810	0.6361257
(-2.03,-1.83]	300	46	254	15.3%	10.0%	0.4447853	0.7311144
(-1.83,-1.6]	300	49	251	16.3%	10.0%	0.5950920	0.8249813
(-1.6,-1.26]	300	58	242	19.3%	10.0%	0.7730061	0.9154824
(-1.26,2.51]	300	74	226	24.7%	10.0%	1.0000000	1.0000000

< section id="performance-metrics" class="level3">

Performance Metrics

# Add KS statistic, capture rate, and cumulative event rate
gains_table %<>%
  mutate(
    # KS statistic (difference between cumulative distributions)
    ks = round(abs(c.events_pct - c.non_events_pct), 2), 
    
    # Capture rate (percentage of total events captured)
    cap_rate = percent(cumsum(events)/sum(events), 1, 100), 
    
    # Cumulative event rate
    c_event_rate = percent(cumsum(events)/cumsum(total), 0.1, 100), 
    
    # Format percentage columns
    c.events_pct = percent(c.events_pct, 0.1, 100),
    c.non_events_pct = percent(c.non_events_pct, 0.1, 100))

# Display the final table
kable(gains_table)

bins	total	events	non_events	event_rate	pop_pct	c.events_pct	c.non_events_pct	ks	cap_rate	c_event_rate
[-5.11,-3.34]	300	3	297	1.0%	10.0%	0.9%	11.1%	0.10	1%	1.0%
(-3.34,-2.89]	300	8	292	2.7%	10.0%	3.4%	22.0%	0.19	3%	1.8%
(-2.89,-2.64]	300	10	290	3.3%	10.0%	6.4%	32.9%	0.26	6%	2.3%
(-2.64,-2.41]	300	14	286	4.7%	10.0%	10.7%	43.6%	0.33	11%	2.9%
(-2.41,-2.23]	300	26	274	8.7%	10.0%	18.7%	53.8%	0.35	19%	4.1%
(-2.23,-2.03]	300	38	262	12.7%	10.0%	30.4%	63.6%	0.33	30%	5.5%
(-2.03,-1.83]	300	46	254	15.3%	10.0%	44.5%	73.1%	0.29	44%	6.9%
(-1.83,-1.6]	300	49	251	16.3%	10.0%	59.5%	82.5%	0.23	60%	8.1%
(-1.6,-1.26]	300	58	242	19.3%	10.0%	77.3%	91.5%	0.14	77%	9.3%
(-1.26,2.51]	300	74	226	24.7%	10.0%	100.0%	100.0%	0.00	100%	10.9%

< section id="reusable-function" class="level2">

Reusable Function

The following implementation encapsulates the gains table construction process within a comprehensive, reusable function suitable for any binary classification model evaluation:

gains_table <- function(act, pred, increasing = TRUE, nBins = 10) {
  
  # Create bins based on predictions
  q <- quantile(pred, probs = seq(0, 1, length.out = nBins + 1))
  bins <- cut(pred, breaks = q, include.lowest = TRUE, right = TRUE, ordered_result = TRUE)
  
  df <- data.frame(act, pred, bins)
  
  df %>% 
    # Group by bins and calculate statistics
    group_by(bins) %>% 
    summarise(total = n(), 
              events = sum(act == 1), 
              non_events = sum(act == 0)) %>% 
    mutate(event_rate = percent(events / total, 0.1, 100)) %>% 
    
    # Sort the table based on the 'increasing' parameter
    {if(increasing == TRUE) {
      arrange(., bins)
    } else {
      arrange(., desc(bins))
    }} %>% 
    
    # Add all performance metrics
    mutate(pop_pct = percent(total/sum(total), 0.1, 100), 
           c.events_pct = cumsum(events) / sum(events),
           c.non_events_pct = cumsum(non_events) / sum(non_events), 
           ks = round(abs(c.events_pct - c.non_events_pct), 2), 
           cap_rate = percent(cumsum(events)/sum(events), 1, 100), 
           c_event_rate = percent(cumsum(events)/cumsum(total), 0.1, 100), 
           c.events_pct = percent(c.events_pct, 0.1, 100),
           c.non_events_pct = percent(c.non_events_pct, 0.1, 100))
}

< section id="function-implementation" class="level3">

Function Implementation

# Generate a gains table with bins in descending order
tab <- gains_table(test$bad_flag, test$pred, FALSE, 10)
kable(tab)

bins	total	events	non_events	event_rate	pop_pct	c.events_pct	c.non_events_pct	ks	cap_rate	c_event_rate
(-1.26,2.51]	300	74	226	24.7%	10.0%	22.7%	8.5%	0.14	23%	24.7%
(-1.6,-1.26]	300	58	242	19.3%	10.0%	40.5%	17.5%	0.23	40%	22.0%
(-1.83,-1.6]	300	49	251	16.3%	10.0%	55.5%	26.9%	0.29	56%	20.1%
(-2.03,-1.83]	300	46	254	15.3%	10.0%	69.6%	36.4%	0.33	70%	18.9%
(-2.23,-2.03]	300	38	262	12.7%	10.0%	81.3%	46.2%	0.35	81%	17.7%
(-2.41,-2.23]	300	26	274	8.7%	10.0%	89.3%	56.4%	0.33	89%	16.2%
(-2.64,-2.41]	300	14	286	4.7%	10.0%	93.6%	67.1%	0.26	94%	14.5%
(-2.89,-2.64]	300	10	290	3.3%	10.0%	96.6%	78.0%	0.19	97%	13.1%
(-3.34,-2.89]	300	8	292	2.7%	10.0%	99.1%	88.9%	0.10	99%	12.0%
[-5.11,-3.34]	300	3	297	1.0%	10.0%	100.0%	100.0%	0.00	100%	10.9%

< section id="interpretation" class="level2">

Interpretation

A properly constructed gains table provides multiple critical insights into model performance characteristics:

Monotonicity Assessment: Event rates should demonstrate consistent increases (or decreases) across bins, confirming the model’s effectiveness in rank-ordering risk levels.
Population Distribution: Consistent bin sizes (ideally ~10% each) indicate appropriate score distribution. Inconsistent sizes suggest score clustering, which may complicate threshold determination.
Kolmogorov-Smirnov (KS) Statistic: The maximum KS value represents the model’s discriminatory power. Higher values (approaching 1.0) indicate superior separation between positive and negative cases.
Capture Rate: Demonstrates the percentage of total events captured at each threshold, essential for operational decision-making.
Cumulative Event Rate: Indicates the event rate among all cases up to each bin, facilitating approval threshold establishment.

< section id="applications-in-credit-risk-analytics" class="level2">

Applications in Credit Risk Analytics

Gains tables serve multiple critical functions in credit risk management environments:

Threshold Optimization: Identification of appropriate score thresholds for automated approval or rejection decisions.
Tiered Strategy Development: Construction of multi-tier decision strategies (approve, manual review, decline) based on quantified risk levels.
Model Performance Monitoring: Longitudinal tracking of model performance through comparison of actual versus expected distributions.
Comparative Model Evaluation: Systematic comparison of alternative models through KS statistics and capture rate analysis.

To leave a comment for the author, please follow the link and comment on their blog: R'tichoke.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.