TLDR: Pass the output of the
isoreg function to
as.stepfun to make an isotonic regression model into a black box object that takes in uncalibrated predictions and outputs calibrated ones.
Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data such that . (We assume no ties among the ‘s for simplicity.) Informally, isotonic regression looks for such that the ‘s approximate the ‘s well while being monotonically non-decreasing. (See this previous post for more technical details.)
Isotonic regression can be performed easily in R with the
isoreg function. Note the slightly unusual syntax when pulling out the fitted values (see the function’s documentation with
?isoreg to understand why this is the case). The plot shows the original data values as black crosses and the fitted values as blue dots. As expected, the blue dots are monotonically increasing.
# training data set.seed(1) x <- sample(2 * 1:15) y <- 0.2 * x + rnorm(length(x)) # isotonic reg fit and plot fit <- isoreg(x, y) plot(x, y, pch = 4) points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue")
Isotonic regression is one commonly used method for calibration: see this previous post for background on calibration and this link for more details with python code. In this setting, we want the isotonic regression model to be a black box: we hand it uncalibrated predictions as an input, and it returns us calibrated predictions.
If you inspect the return value of the
isoreg function, you will find that it is unable to interact with any new test data. Imagine that we have some new test data that we want to calibrate:
set.seed(2) test_x <- sample(2 * 1:15 - 1) test_y <- 0.2 * test_x + rnorm(length(test_x))
A naive and WRONG way to calibrate
test_y would be to run isotonic regression on just the test data. The plot shows the fits for the training data as blue dots and the fits for the testing data as red squares: the overall fit is not monotonic.
# WRONG isotonic reg fit and plot fit2 <- isoreg(test_x, test_y) plot(x, y, pch = 4) points(test_x, test_y, pch = 4) points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue") points(fit2$x[fit2$ord], fit2$yf, pch = 15, col = "red")
A second WRONG way to calibrate
test_y</code> would be to run isotonic regression on the combined training/test data. The plot shows that while the overall fit is monotonic, the predictions for the training data have shifted, i.e. you have changed the black box.
# WRONG isotonic reg fit and plot (v2) all_x <- c(x, test_x) all_y <- c(y, test_y) fit3 <- isoreg(all_x, all_y) plot(all_x, all_y, pch = 4) points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue") points(fit3$x[fit3$ord], fit3$yf, pch = 15, col = "red")
The CORRECT way to make the isotonic regression model into a black box is to pass the output of
isoreg to the
as.stepfun function, like so:
isofit <- as.stepfun(isoreg(x, y))
isofit is the black box we seek: a function that we give uncalibrated predictions to get calibrated predictions in return. As the plot below shows, the overall fit is still monotonic, and the calibrated predictions for the training data do not change.
plot(all_x, all_y, pch = 4) points(all_x, isofit(all_x), pch = 15, col = "red") points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue")