Site icon R-bloggers

Some heuristics about spline smoothing

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Let us continue our discussion on smoothing techniques in regression. Assume that . where is some unkown function, but assumed to be sufficently smooth. For instance, assume that  is continuous, that exists, and is continuous, that  exists and is also continuous, etc. If  is smooth enough, Taylor’s expansion can be used. Hence, for

which can also be writen as

for some ‘s. The first part is simply a polynomial.

The second part, is some integral. Using Riemann integral, observe that

for some ‘s, and some

Thus,

Nice! We have our linear regression model. A natural idea is then to consider a regression of  on  where

given some knots . To make things easier to understand, let us work with our previous dataset,

plot(db)

If we consider one knot, and an expansion of order 1,

attach(db)
library(splines)
B=bs(xr,knots=c(3),Boundary.knots=c(0,10),degre=1)
reg=lm(yr~B)
lines(xr[xr<=3],predict(reg)[xr<=3],col="red")
lines(xr[xr>=3],predict(reg)[xr>=3],col="blue")

The prediction obtained with this spline can be compared with regressions on subsets (the doted lines)

reg=lm(yr~xr,subset=xr<=3)
lines(xr[xr<=3],predict(reg)[xr<=3],col="red",lty=2)
reg=lm(yr~xr,subset=xr>=3)
lines(xr[xr>=3],predict(reg),col="blue",lty=2)

It is different, since we have here three parameters (and not four, as for the regressions on the two subsets). One degree of freedom is lost, when asking for a continuous model. Observe that it is possible to write, equivalently

reg=lm(yr~bs(xr,knots=c(3),Boundary.knots=c(0,10),degre=1),data=db)

So, what happened here?

B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=1)
matplot(xr,B,type="l")
abline(v=c(0,2,5,10),lty=2)

Here, the functions that appear in the regression are the following

Now, if we run the regression on those two components, we get

B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=1)
matplot(xr,B,type="l")
abline(v=c(0,2,5,10),lty=2)

If we add one knot, we get

the prediction is

reg=lm(yr~B)
lines(xr,predict(reg),col="red")

Of course, we can choose much more knots,

B=bs(xr,knots=1:9,Boundary.knots=c(0,10),degre=1)
reg=lm(yr~B)
lines(xr,predict(reg),col="red")

We can even get a confidence interval

reg=lm(yr~B)
P=predict(reg,interval="confidence")
plot(db,col="white")
polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3])),col="light blue",border=NA)
points(db)
reg=lm(yr~B)
lines(xr,P[,1],col="red")
abline(v=c(0,2,5,10),lty=2)

And if we keep the  two knots we chose previously, but consider Taylor’s expansion of order 2, we get

B=bs(xr,knots=c(2,5),Boundary.knots=c(0,10),degre=2)
matplot(xr,B,type="l")
abline(v=c(0,2,5,10),lty=2)

So, what’s going on? If we consider the constant, and the first component of the spline based matrix, we get

k=2
plot(db)
B=cbind(1,B)
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)

If we add the constant term, the first term and the second term, we get the part on the left, before the first knot,

k=3
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)

and with three terms from the spline based matrix, we can get the part between the two knots,

k=4
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)

and finallty, when we sum all the terms, we get this time the part on the right, after the last knot,

k=5
lines(xr,B[,1:k]%*%coefficients(reg)[1:k],col=k-1,lty=k-1)

This is what we get using a spline regression, quadratic, with two (fixed) knots. And can can even get confidence intervals, as before

reg=lm(yr~B)
P=predict(reg,interval="confidence")
plot(db,col="white")
polygon(c(xr,rev(xr)),c(P[,2],rev(P[,3])),col="light blue",border=NA)
points(db)
reg=lm(yr~B)
lines(xr,P[,1],col="red")
abline(v=c(0,2,5,10),lty=2)

The great idea here is to use functions , that will insure continuity at point .

Of course, we can use those splines on our Dexter application,

Here again, using linear spline function, it is possible to impose a continuity constraint,

plot(data$no,data$mu,ylim=c(6,10))
abline(v=12*(0:8)+.5,lty=2)
reg=lm(mu~bs(no,knots=c(12*(1:7)+.5),Boundary.knots=c(0,97),
degre=1),data=db)
lines(c(1:94,96),predict(reg),col="red")

But we can also consider some quadratic splines,

plot(data$no,data$mu,ylim=c(6,10))
abline(v=12*(0:8)+.5,lty=2)
reg=lm(mu~bs(no,knots=c(12*(1:7)+.5),Boundary.knots=c(0,97),
degre=2),data=db)
lines(c(1:94,96),predict(reg),col="red")

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.