Site icon R-bloggers

Getting predictions from an isotonic regression model

[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

TLDR: Pass the output of the isoreg function to as.stepfun to make an isotonic regression model into a black box object that takes in uncalibrated predictions and outputs calibrated ones.

Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data such that . (We assume no ties among the ‘s for simplicity.) Informally, isotonic regression looks for such that the ‘s approximate the ‘s well while being monotonically non-decreasing. (See this previous post for more technical details.)

Isotonic regression can be performed easily in R with the stats package’s isoreg function. Note the slightly unusual syntax when pulling out the fitted values (see the function’s documentation with ?isoreg to understand why this is the case). The plot shows the original data values as black crosses and the fitted values as blue dots. As expected, the blue dots are monotonically increasing.

# training data
set.seed(1)
x <- sample(2 * 1:15)
y <- 0.2 * x + rnorm(length(x))

# isotonic reg fit and plot
fit <- isoreg(x, y)
plot(x, y, pch = 4)
points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue")

Isotonic regression is one commonly used method for calibration: see this previous post for background on calibration and this link for more details with python code. In this setting, we want the isotonic regression model to be a black box: we hand it uncalibrated predictions as an input, and it returns us calibrated predictions.

If you inspect the return value of the isoreg function, you will find that it is unable to interact with any new test data. Imagine that we have some new test data that we want to calibrate:

set.seed(2)
test_x <- sample(2 * 1:15 - 1)
test_y <- 0.2 * test_x + rnorm(length(test_x))

A naive and WRONG way to calibrate test_y would be to run isotonic regression on just the test data. The plot shows the fits for the training data as blue dots and the fits for the testing data as red squares: the overall fit is not monotonic.

# WRONG isotonic reg fit and plot
fit2 <- isoreg(test_x, test_y)

plot(x, y, pch = 4)
points(test_x, test_y, pch = 4)
points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue")
points(fit2$x[fit2$ord], fit2$yf, pch = 15, col = "red")

A second WRONG way to calibrate test_y</code> would be to run isotonic regression on the combined training/test data. The plot shows that while the overall fit is monotonic, the predictions for the training data have shifted, i.e. you have changed the black box.

# WRONG isotonic reg fit and plot (v2)
all_x <- c(x, test_x)
all_y <- c(y, test_y)
fit3 <- isoreg(all_x, all_y)

plot(all_x, all_y, pch = 4)
points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue")
points(fit3$x[fit3$ord], fit3$yf, pch = 15, col = "red")

The CORRECT way to make the isotonic regression model into a black box is to pass the output of isoreg to the as.stepfun function, like so:

isofit <- as.stepfun(isoreg(x, y))

isofit is the black box we seek: a function that we give uncalibrated predictions to get calibrated predictions in return. As the plot below shows, the overall fit is still monotonic, and the calibrated predictions for the training data do not change.

plot(all_x, all_y, pch = 4)
points(all_x, isofit(all_x), pch = 15, col = "red")
points(fit$x[fit$ord], fit$yf, pch = 16, col = "blue")

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Odds & Ends.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.