**Freakonometrics » R-english**, and kindly contributed to R-bloggers)

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using

> library(splines) > fit=lm(y~bs(x,degree=1,df=3),data=df)

The problem with that spline function is that knots seem to be fixed. The iterative boosting algorithm is

- start with some regression model
- compute the residuals, including some shrinkage parameter,

then the strategy is to model those residuals

- at step , consider regression
- update the residuals

and to loop. Then set

I thought that boosting would work well if at step , it was possible to change the knots. But the output

was quite disappointing: boosting does not improve the prediction here. And it looks like knots don’t change. Actually, if we select the ‘*best*‘ knots, the output is much better. The dataset is still

> n=300 > set.seed(1) > u=sort(runif(n)*2*pi) > y=sin(u)+rnorm(n)/4 > df=data.frame(x=u,y=y)

For an optimal choice of knot locations, we can use

> library(freeknotsplines) > xy.freekt=freelsgen(df$x, df$y, degree = 1, + numknot = 2, 555)

The code of the previous post can simply be updated

> v=.05 > library(splines) > xy.freekt=freelsgen(df$x, df$y, degree = 1, + numknot = 2, 555) > fit=lm(y~bs(x,degree=1,knots= + [email protected]),data=df) > yp=predict(fit,newdata=df) > df$yr=df$y - v*yp > YP=v*yp > for(t in 1:200){ + xy.freekt=freelsgen(df$x, df$yr, degree = 1, + numknot = 2, 555) + fit=lm(yr~bs(x,degree=1,knots= + [email protected]),data=df) + yp=predict(fit,newdata=df) + df$yr=df$yr - v*yp + YP=cbind(YP,v*yp) + } > nd=data.frame(x=seq(0,2*pi,by=.01)) > viz=function(M){ + if(M==1) y=YP[,1] + if(M>1) y=apply(YP[,1:M],1,sum) + plot(df$x,df$y,ylab="",xlab="") + lines(df$x,y,type="l",col="red",lwd=3) + fit=lm(y~bs(x,degree=1,df=3),data=df) + yp=predict(fit,newdata=nd) + lines(nd$x,yp,type="l",col="blue",lwd=3) + lines(nd$x,sin(nd$x),lty=2)} > viz(100)

I like that graph. I had the intuition that using (simple) splines would be possible, and indeed, we get a very smooth prediction.

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics » R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...