Some heuristics about spline smoothing

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Let us continue our discussion on smoothing techniques in regression. Assume that . where is some unkown function, but assumed to be sufficently smooth. For instance, assume that  is continuous, that exists, and is continuous, that  exists and is also continuous, etc. If  is smooth enough, Taylor’s expansion can be used. Hence, for http://latex.codecogs.com/gif.latex?x\in(\alpha,\beta)

which can also be writen as

for some http://latex.codecogs.com/gif.latex?a_k‘s. The first part is simply a polynomial.

The second part, is some integral. Using Riemann integral, observe that

for some http://latex.codecogs.com/gif.latex?b_i‘s, and some

Thus,

Nice! We have our linear regression model. A natural idea is then to consider a regression of http://latex.codecogs.com/gif.latex?Y on http://latex.codecogs.com/gif.latex?\boldsymbol{X} where

http://latex.codecogs.com/gif.latex?\boldsymbol{X}%20=%20(1,X,X^2,\cdots,X^d,(X-x_1)_+^d,\cdots,(X-x_k)_+^d%20)

given some knots http://latex.codecogs.com/gif.latex?\{x_1,\cdots,x_k\}. To make things easier to understand, let us work with our previous dataset,

plot(db)

https://i0.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_146.png?w=578

If we consider one knot, and an expansion of order 1,

attach(db)
library(splines)
B=bs(xr,knots=c(3),Boundary.knots=c(0,10),degre=1)
reg=lm(yr~B)
lines(xr[xr<=3],predict(reg)[xr<=3],col="red")
lines(xr[xr>=3],predict(reg)[xr>=3],col="blue")

The prediction obtained with this spline can be compared with regressions on subsets (the doted lines)

reg=lm(yr~xr,subset=xr<=3)
lines(xr[xr<=3],predict(reg)[xr<=3],col="red",lty=2)
reg=lm(yr~xr,subset=xr>=3)
lines(xr[xr>=3],predict(reg),col="blue",lty=2)

https://i2.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_160.png?w=578

It is different, since we have here three parameters (and not four, as for the regressions on the two subsets). One degree of freedom is lost, when asking for a continuous model. Observe that it is possible to write, equivalently

reg=lm(yr~bs(xr,knots=c(3),Boundary.knots=c(0,10),degre=1),data=db)

So, what happened here?

B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=1)
matplot(xr,B,type="l")
abline(v=c(0,2,5,10),lty=2)

Here, the functions that appear in the regression are the following

https://i1.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_161.png?w=578

Now, if we run the regression on those two components, we get

B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=1)
matplot(xr,B,type="l")
abline(v=c(0,2,5,10),lty=2)

If we add one knot, we get

https://i2.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_162.png?w=578

the prediction is

reg=lm(yr~B)
lines(xr,predict(reg),col="red")

https://i0.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_147.png?w=578

Of course, we can choose much more knots,

B=bs(xr,knots=1:9,Boundary.knots=c(0,10),degre=1)
reg=lm(yr~B)
lines(xr,predict(reg),col="red")

https://i2.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_148.png?w=578

We can even get a confidence interval

reg=lm(yr~B)
P=predict(reg,interval="confidence")
plot(db,col="white")
polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3])),col="light blue",border=NA)
points(db)
reg=lm(yr~B)
lines(xr,P[,1],col="red")
abline(v=c(0,2,5,10),lty=2)

https://i2.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_149.png?w=578

And if we keep the  two knots we chose previously, but consider Taylor’s expansion of order 2, we get

B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=2)
matplot(xr,B,type="l")
abline(v=c(0,2,5,10),lty=2)

https://i0.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_163.png?w=578

So, what’s going on? If we consider the constant, and the first component of the spline based matrix, we get

k=2
plot(db)
B=cbind(1,B)
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)

https://i0.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_164.png?w=578

If we add the constant term, the first term and the second term, we get the part on the left, before the first knot,

k=3
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)

https://i1.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_165.png?w=578

and with three terms from the spline based matrix, we can get the part between the two knots,

k=4
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)

https://i2.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_166.png?w=578

and finallty, when we sum all the terms, we get this time the part on the right, after the last knot,

k=5
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)

https://i0.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_167.png?w=578

This is what we get using a spline regression, quadratic, with two (fixed) knots. And can can even get confidence intervals, as before

reg=lm(yr~B)
P=predict(reg,interval="confidence")
plot(db,col="white")
polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3])),col="light blue",border=NA)
points(db)
reg=lm(yr~B)
lines(xr,P[,1],col="red")
abline(v=c(0,2,5,10),lty=2)

https://i2.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_168.png?w=578

The great idea here is to use functions http://latex.codecogs.com/gif.latex?(x-x_i)_+, that will insure continuity at point http://latex.codecogs.com/gif.latex?x_i.

Of course, we can use those splines on our Dexter application,

https://i0.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_170.png?w=578

Here again, using linear spline function, it is possible to impose a continuity constraint,

plot(data$no,data$mu,ylim=c(6,10))
abline(v=12*(0:8)+.5,lty=2)
reg=lm(mu~bs(no,knots=c(12*(1:7)+.5),Boundary.knots=c(0,97),
degre=1),data=db)
lines(c(1:94,96),predict(reg),col="red")

https://i0.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_172.png?w=578

But we can also consider some quadratic splines,

plot(data$no,data$mu,ylim=c(6,10))
abline(v=12*(0:8)+.5,lty=2)
reg=lm(mu~bs(no,knots=c(12*(1:7)+.5),Boundary.knots=c(0,97),
degre=2),data=db)
lines(c(1:94,96),predict(reg),col="red")

https://i2.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/10/Selection_171.png?w=578

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)