Tukey and Mosteller’s Bulging Rule (and Ladder of Powers)

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When discussing transformations in regression models, I usually briefly introduce the Box-Cox transform (see e.g. an old post on that topic) and I also mention local regressions and nonparametric estimators (see e.g. another post). But while I was working on my ACT6420 course (on predictive modeling, which is a VEE for the SOA), I read something about a “Ladder of Powers Rule” also called “Tukey and Mosteller’s Bulging Rule“. To be honest, I never heard about this rule before. But that won’t be the first time I learn something while working on my notes for a course !

The point here is that, in a standard linear regression model, we have

But sometimes, a linear relationship is not appropriate. One idea can be to transform the variable we would like to model, , and to consider

This is what we usually do with the Box-Cox transform. Another idea can be to transform the explanatory variable, , and now, consider,

For instance, this year in the course, we considered – at some point – a continuous piecewise linear functions,

It is also possible to consider some polynomial regression. The ”Tukey and Mosteller’s Bulging Rule” is based on the following figure.

and the idea is that it might be interesting to transform  and  at the same time, using some power functions. To be more specific, we will consider some linear model

for some (positive) parameters  and . Depending on the shape of the regression function (the four curves mentioned on the graph above, in the four quadrant) different powers will be considered.

To be more specific, let us generate different models, and let us look at the associate scatterplot,

> fakedataMT=function(p=1,q=1,n=99,s=.1){
+ set.seed(1)
+ X=seq(1/(n+1),1-1/(n+1),length=n)
+ Y=(5+2*X^p+rnorm(n,sd=s))^(1/q)
+ return(data.frame(x=X,y=Y))}
> par(mfrow=c(2,2))
> plot(fakedataMT(p=.5,q=2),main="(p=1/2,q=2)")
> plot(fakedataMT(p=3,q=-5),main="(p=3,q=-5)")
> plot(fakedataMT(p=.5,q=-1),main="(p=1/2,q=-1)")
> plot(fakedataMT(p=3,q=5),main="(p=3,q=5)")

If we consider the South-West part of the graph, to get such a pattern, we can consider

or more generally

where  and  are both larger than 1. And the larger  and/or , the more convex the regression curve.

Let us visualize that double transformation on a dataset, say the cars dataset.

> base=cars
­> names(base)=c("x","y")
> MostellerTukey=function(p=1,q=1){
+ regpq=lm(I(y^q)~I(x^p),data=base)
+ u=seq(min(min(base$x)-2,.1),max(base$x)+2,length=501)
+ par(mfrow=c(1,2))
+ plot(base$x,base$y,xlab="X",ylab="Y",col="white")
+ vic=predict(regpq,newdata=data.frame(x=u),interval="prediction")
+ vic[vic<=0]=.1
+ polygon(c(u,rev(u)),c(vic[,2],rev(vic[,3]))^(1/q),col="light blue",density=40,border=NA)
+ lines(u,vic[,2]^(1/q),col="blue")
+ lines(u,vic[,3]^(1/q),col="blue")
+ v=predict(regpq,newdata=data.frame(x=u))^(1/q)
+ lines(u,v,col="blue")
+ points(base$x,base$y)
+ 
+ plot(base$x^p,base$y^q,xlab=paste("X^",p,sep=""),ylab=paste("Y^",q,sep=""),col="white")
+ polygon(c(u,rev(u))^p,c(vic[,2],rev(vic[,3])),col="light blue",density=40,border=NA)
+ lines(u^p,vic[,2],col="blue")
+ lines(u^p,vic[,3],col="blue")
+ abline(regpq,col="blue")
+ points(base$x^p,base$y^q)
+ }

For instance, if we call

> MostellerTukey(2,1)

we get the following graph,

On the left, we have the original dataset,  and on the right, the transformed one, , with two possible transformations. Here, we did only consider the square of the speed of the car (and only one component was transformed, here). On that transformed dataset, we run a standard linear regression. We add, here, a confidence tube. And then, we consider the inverse transformation of the prediction. This line is plotted on the left. The problem is that it should not be considered as our optimal prediction, since it is clearly biased because . But quantiles associated with a monotone transformation are the transformed quantiles. So confidence tubes can still be considered as confidence tubes.

Note that here, it could have be possible to consider another transformation, with the same shape, but quite different

> MostellerTukey(1,.5)

Of course, there is no reason to consider a simple power function, and the Box-Cox transform can also be used. The interesting point is that the logarithm can be obtained as a particular case. Furthermore, it is also possible to seek optimal transformations, seen here as a pair of parameters. Consider

> p=.1
> bc=boxcox(y~I(x^p),data=base,lambda=seq(.1,3,by=.1))$y
> for(p in seq(.2,3,by=.1)) bc=cbind(bc,boxcox(y~I(x^p),data=base,lambda=seq(.1,3,by=.1))$y)
> vp=boxcox(y~I(x^p),data=base,lambda=seq(.1,3,by=.1))$x
> vq=seq(.1,3,by=.1)
> library(RColorBrewer)
> blues=colorRampPalette(brewer.pal(9,"Blues"))(100)
> image(vp,vq,bc,col=blues)
> contour(vp,vq,bc,levels=seq(-60,-40,by=1),col="white",add=TRUE)

The darker, the better (here the log-likelihood is considered). The optimal pair is here

> bc=function(a){p=a[1];q=a[2]; as.numeric(-boxcox(y~I(x^p),data=base,lambda=q)$y[50])}
> optim(c(1,1), bc,method="L-BFGS-B",lower=c(0,0),upper=c(3,3))
$par
[1] 0.5758362 0.3541601

$value
[1] 47.27395

and indeed, the model we get is not bad,

Fun, ins’t it?

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)