[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Let’s say we have data $(x_1, y_1), \dots, (x_n, y_n) \in \mathbb{R}^2$ such that $x_1 < x_2 < \dots < x_n$. (We assume no ties among the $x_i$‘s for simplicity.) Isotonic regression gives us a monotonic fit $\beta_1 \leq \dots \leq \beta_n$ for the $y_i$‘s by solving the problem

\begin{aligned} \text{minimize}_{\beta_1, \dots, \beta_n} \quad& \sum_{i=1}^n (y_i - \beta_i)^2 \\ \text{subject to} \quad& \beta_1 \leq \dots \leq \beta_n. \end{aligned}

(See this previous post for more details.) Nearly-isotonic regression, introduced by Tibshirani et al. (2009) (Reference 1), generalizes isotonic regression by solving the problem

\begin{aligned} \text{minimize}_{\beta_1, \dots, \beta_n} \quad \frac{1}{2}\sum_{i=1}^n (y_i - \beta_i)^2 + \lambda \sum_{i=1}^{n-1} (\beta_i - \beta_{i+1})_+, \end{aligned}

where $x_+ = \max (0, x)$ and $\lambda \geq 0$ is a user-specified hyperparameter.

It turns out that, due to properties of the optimization problem, the nearly-isotonic regression fit can be computed for all $\lambda$ values in $O(n \log n)$ time, making it a practical method for use. See Section 3 and Algorithm 1 of Reference 1 for details. (More accurately, we can determine the nearly-isotonic regression fit for a critical set of $\lambda$ values: the fit for any other other $\lambda$ value will be a linear interpolation of fits from this critical set.)

How is nearly-isotonic regression a generalization of isotonic regression? The term $(\beta_i - \beta_{i+1})_+$ is positive if and only if

Why might you want to use nearly-isotonic regression? One possible reason is to check if the assumption monotonicity is reasonable for your data. To do so, run nearly-isotonic regression with cross-validation on $\lambda$ and compute the CV error for each $\lambda$ value. If the CV error achieved by the isotonic regression fit (i.e. largest $\lambda$ value) is close to or statistically indistinguishable from the minimum, that gives assurance that monotonicity is a reasonable assumption for your data.

You can perform nearly-isotonic regression in R with the neariso package. The neariso() function returns fits for an entire path of $\lambda$ values. The animation below shows how the fit changes as $\lambda$ gets larger and larger (code available here).

Note 1: The formulation for nearly-isotonic regression above assumes that the points $x_1, \dots, x_n$ are equally spaced. If they are not, one should replace the penalty with

\begin{aligned} \lambda \sum_{i=1}^{n-1} \frac{(\beta_i - \beta_{i+1})_+}{x_{i+1} - x_i} \end{aligned}

to account for the different-sized gaps. The neariso package only seems to handle the case where the $x_i$‘s are equally spaced.

Note 2: The animation above was created by generating separate .png files for each value of $\lambda$, then stitching them together using the magick package. My initial hope was to create an animation that would transition smoothly between the different fits using the gganimate package but the transitions weren’t as smooth as I would have imagined them to be:

Does anyone know how this issue could be fixed? Code for the animation is below, full code available here.

p <- ggplot(df, aes(x = x, y = beta)) +
geom_path(col = "blue") +
geom_point(data = truth_df, aes(x = x, y = y), shape = 4) +
labs(title = "Nearly isotonic regression fits",
subtitle = paste("Lambda = ", "{lambda[as.integer(closest_state)]}")) +
transition_states(iter, transition_length = 1, state_length = 2) +
theme_bw() +
theme(plot.title = element_text(size = rel(1.5), face = "bold"))
animate(p, fps = 5)


References:

1. Tibshirani, R. J., Hoefling, H., and Tibshirani, R. (2011). Nearly-isotonic regression.