Convex Regression Model

July 5, 2018
By

(This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers)

This morning during the lecture on nonlinear regression, I mentioned (very) briefly the case of convex regression. Since I forgot to mention the codes in R, I will publish them here. Assume that y_i=m(\mathbf{x}_i)+\varepsilon_i where m:\mathbb{R}^d\rightarrow \mathbb{R} is some convex function.

Then m is convex if and only if \forall\mathbf{x}_1,\mathbf{x}_2\in\mathbb{R}^d, \forall t\in[0,1], m(t\mathbf{x}_1+[1-t]\mathbf{x}_2) \leq tm(\mathbf{x}_1)+[1-t]m(\mathbf{x}_2)Hidreth (1954) proved that if m^\star=\underset{m \text{ convex}}{\text{argmin}}\left\lbrace\sum_{i=1}^n \big(y_i-m(\mathbf{x_i})\big)^2\right\rbracethen \mathbf{\theta}^\star=(m^\star(\mathbf{x_1}),\cdots,m^\star(\mathbf{x_n})) is unique.

Let \mathbf{y}=\mathbf{\theta}+\mathbf{\varepsilon}, then \mathbf{\theta}^\star=\underset{\mathbf{\theta}\in \mathcal{K}}{\text{argmin}}\left\lbrace\sum_{i=1}^n \big(y_i-\theta_i)\big)^2\right\rbracewhere\mathcal{K}=\{\mathbf{\theta}\in\mathbb{R}^n:\exists m\text{ convex },m(\mathbf{x}_i)=\theta_i\}. I.e. \mathbf{\theta}^\star is the projection of \mathbf{y} onto the (closed) convex cone \mathcal{K}. The projection theorem gives existence and unicity.

For convenience, in the application, we will consider the real-valued case, m:\mathbb{R}\rightarrow \mathbb{R}, i.e. y_i=m(x_i)+\varepsilon_i. Assume that observations are ordered x_1\leq x_2\leq\cdots \leq x_n. Here \mathcal{K}=\left\lbrace\mathbf{\theta}\in\mathbb{R}^n:\frac{\theta_2-\theta_1}{x_2-x_1}\leq \frac{\theta_3-\theta_2}{x_3-x_2}\leq \cdots \leq \frac{\theta_n-\theta_{n-1}}{x_n-x_{n-1}}\right\rbrace

Hence, quadratic program with n-2 linear constraints.

m^\star is a piecewise linear function (interpolation of consecutive pairs (x_i,\theta_i^\star)).

If m is differentiable, m is convex if m(\mathbf{x})+ \nabla m(\mathbf{x})^{\text{T}}\cdot[\mathbf{y}-\mathbf{x}] \leq m(\mathbf{y})

More generally, if m is convex, then there exists \xi_{\mathbf{x}}\in\mathbb{R}^n such that m(\mathbf{x})+ \xi_{\mathbf{x}}^{\text{ T}}\cdot[\mathbf{y}-\mathbf{x}] \leq m(\mathbf{y})
\xi_{\mathbf{x}} is a subgradient of m at {\mathbf{x}}. And then \partial m(\mathbf{x})=\big\lbrace m(\mathbf{x})+ \xi^{\text{ T}}\cdot[\mathbf{y}-\mathbf{x}] \leq m(\mathbf{y}),\forall \mathbf{y}\in\mathbb{R}^n\big\rbrace

Hence, \mathbf{\theta}^\star is solution of \text{argmin}\big\lbrace\|\mathbf{y}-\mathbf{\theta}\|^2\big\rbrace\text{subject to }\theta_i+\xi_i^{\text{ T}}[\mathbf{x}_j-\mathbf{x}_i]\leq\mathbf{\theta}_j,~\forall i,j and \xi_1,\cdots,\xi_n\in\mathbb{R}^n. Now, to do it for real, use cobs package for constrained (b)splines regression,

1
library(cobs)

To get a convex regression, use

1
2
3
4
5
plot(cars)
x = cars$speed
y = cars$dist
rc = conreg(x,y,convex=TRUE)
lines(rc, col = 2)


Here we can get the values of the knots

1
2
3
4
5
6
7
rc
 
Call:  conreg(x = x, y = y, convex = TRUE) 
Convex regression: From 19 separated x-values, using 5 inner knots,
     7,    8,    9,   20,   23.
RSS =  1356; R^2 = 0.8766;
 needed (5,0) iterations

and actually, if we use them in a linear-spline regression, we get the same output here

1
2
3
4
reg = lm(dist~bs(speed,degree=1,knots=c(4,7,8,9,,20,23,25)),data=cars)
u = seq(4,25,by=.1)
v = predict(reg,newdata=data.frame(speed=u))
lines(u,v,col="green")

Let us add vertical lines for the knots

1
abline(v=c(4,7,8,9,20,23,25),col="grey",lty=2)

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)