Convex Regression Model

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This morning during the lecture on nonlinear regression, I mentioned (very) briefly the case of convex regression. Since I forgot to mention the codes in R, I will publish them here. Assume that [latex]y_i=m(\mathbf{x}_i)+\varepsilon_i[/latex] where [latex]m:\mathbb{R}^d\rightarrow \mathbb{R}[/latex] is some convex function.

Then [latex]m[/latex] is convex if and only if [latex]\forall\mathbf{x}_1,\mathbf{x}_2\in\mathbb{R}^d[/latex], [latex]\forall t\in[0,1][/latex], [latex display=”true”]m(t\mathbf{x}_1+[1-t]\mathbf{x}_2) \leq tm(\mathbf{x}_1)+[1-t]m(\mathbf{x}_2)[/latex]Hidreth (1954) proved that if[latex display=”true”] m^\star=\underset{m \text{ convex}}{\text{argmin}}\left\lbrace\sum_{i=1}^n \big(y_i-m(\mathbf{x_i})\big)^2\right\rbrace[/latex]then [latex]\mathbf{\theta}^\star=(m^\star(\mathbf{x_1}),\cdots,m^\star(\mathbf{x_n}))[/latex] is unique.

Let [latex]\mathbf{y}=\mathbf{\theta}+\mathbf{\varepsilon}[/latex], then [latex display=”true”]\mathbf{\theta}^\star=\underset{\mathbf{\theta}\in \mathcal{K}}{\text{argmin}}\left\lbrace\sum_{i=1}^n \big(y_i-\theta_i)\big)^2\right\rbrace[/latex]where[latex display=”true”]\mathcal{K}=\{\mathbf{\theta}\in\mathbb{R}^n:\exists m\text{ convex },m(\mathbf{x}_i)=\theta_i\}[/latex]. I.e. [latex]\mathbf{\theta}^\star[/latex] is the projection of [latex]\mathbf{y}[/latex] onto the (closed) convex cone [latex]\mathcal{K}[/latex]. The projection theorem gives existence and unicity.

For convenience, in the application, we will consider the real-valued case, [latex]m:\mathbb{R}\rightarrow \mathbb{R}[/latex], i.e. [latex]y_i=m(x_i)+\varepsilon_i[/latex]. Assume that observations are ordered [latex]x_1\leq x_2\leq\cdots \leq x_n[/latex]. Here [latex display=”true”]\mathcal{K}=\left\lbrace\mathbf{\theta}\in\mathbb{R}^n:\frac{\theta_2-\theta_1}{x_2-x_1}\leq \frac{\theta_3-\theta_2}{x_3-x_2}\leq \cdots \leq \frac{\theta_n-\theta_{n-1}}{x_n-x_{n-1}}\right\rbrace[/latex]

Hence, quadratic program with [latex]n-2[/latex] linear constraints.

[latex]m^\star[/latex] is a piecewise linear function (interpolation of consecutive pairs [latex](x_i,\theta_i^\star)[/latex]).

If [latex]m[/latex] is differentiable, [latex]m[/latex] is convex if [latex display=”true”]m(\mathbf{x})+ \nabla m(\mathbf{x})^{\text{T}}\cdot[\mathbf{y}-\mathbf{x}] \leq m(\mathbf{y})[/latex]

More generally, if [latex]m[/latex] is convex, then there exists [latex]\xi_{\mathbf{x}}\in\mathbb{R}^n[/latex] such that [latex display=”true”]m(\mathbf{x})+ \xi_{\mathbf{x}}^{\text{ T}}\cdot[\mathbf{y}-\mathbf{x}] \leq m(\mathbf{y})[/latex]
[latex]\xi_{\mathbf{x}}[/latex] is a subgradient of [latex]m[/latex] at [latex]{\mathbf{x}}[/latex]. And then [latex display=”true”]\partial m(\mathbf{x})=\big\lbrace m(\mathbf{x})+ \xi^{\text{ T}}\cdot[\mathbf{y}-\mathbf{x}] \leq m(\mathbf{y}),\forall \mathbf{y}\in\mathbb{R}^n\big\rbrace[/latex]

Hence, [latex]\mathbf{\theta}^\star[/latex] is solution of [latex display=”true”]\text{argmin}\big\lbrace\|\mathbf{y}-\mathbf{\theta}\|^2\big\rbrace[/latex][latex display=”true”]\text{subject to }\theta_i+\xi_i^{\text{ T}}[\mathbf{x}_j-\mathbf{x}_i]\leq\mathbf{\theta}_j,~\forall i,j[/latex] and [latex]\xi_1,\cdots,\xi_n\in\mathbb{R}^n[/latex]. Now, to do it for real, use cobs package for constrained (b)splines regression,


To get a convex regression, use

x = cars$speed
y = cars$dist
rc = conreg(x,y,convex=TRUE)
lines(rc, col = 2)

Here we can get the values of the knots


Call:  conreg(x = x, y = y, convex = TRUE) 
Convex regression: From 19 separated x-values, using 5 inner knots,
     7,    8,    9,   20,   23.
RSS =  1356; R^2 = 0.8766;
 needed (5,0) iterations

and actually, if we use them in a linear-spline regression, we get the same output here

reg = lm(dist~bs(speed,degree=1,knots=c(4,7,8,9,,20,23,25)),data=cars)
u = seq(4,25,by=.1)
v = predict(reg,newdata=data.frame(speed=u))

Let us add vertical lines for the knots


To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)