Studying joint effects in a regression

October 7, 2010
By

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

We’ve seen in the previous post (here)  how important the *-cartesian
product to model joint effected in the regression. Consider the case of
two explanatory variates, one continuous (, the age of the driver) and one qualitative (, gasoline versus diesel).

Assume here that

Then, given  (the exposure, assumed to be constant) and

Thus, there is a multplicative effect of the qualitative variate.
> reg=glm(nbre~bs(ageconducteur)+carburant+offset(exposition),
+     data=sinistres,family=”poisson”)
> ageD=data.frame(ageconducteur=seq(17,90),carburant=”D”,exposition=1)
> ageE=data.frame(ageconducteur=seq(17,90),carburant=”E”,exposition=1)
> yD=predict(reg,newdata=ageD,type=”response”)
> yE=predict(reg,newdata=ageE,type=”response”)
> lines(ageD\$ageconducteur,yD,col=”blue”,lwd=2)
> lines(ageE\$ageconducteur,yE,col=”red”,lwd=2)

On the graph below, we can see that the ratio

is constant (and independent of the age ).
> plot(ageD\$ageconducteur,yD/yE)

In
order to take into accound a more complex (non constant) interaction
between the two explanatory variates, consider the following product
model,
> reg=glm(nbre~bs(ageconducteur)*carburant+offset(exposition),
+     data=sinistres,family=”poisson”)
> ageD=data.frame(ageconducteur=seq(17,90),carburant=”D”,exposition=1)
> ageE=data.frame(ageconducteur=seq(17,90),carburant=”E”,exposition=1)
> yD=predict(reg,newdata=ageD,type=”response”)
> yE=predict(reg,newdata=ageE,type=”response”)
> lines(ageD\$ageconducteur,yD,col=”blue”,lwd=2)
> lines(ageE\$ageconducteur,yE,col=”red”,lwd=2)

Here, the ratio

is not constant any longer,

It
is also possible to consider a model in between: we believe that there
is no interaction for young people (say), while there is for older
ones. Assume that the beak occurs at age 50,
> reg=glm(nbre~bs(ageconducteur*(ageconducteur<50))+
+     bs(ageconducteur*(ageconducteur>=50))*carburant+offset(exposition),
+     data=sinistres,family=”poisson”)
> ageD=data.frame(ageconducteur=seq(17,90,by=.1),carburant=”D”,exposition=1)
> ageE=data.frame(ageconducteur=seq(17,90,by=.1),carburant=”E”,exposition=1)
> yD=predict(reg,newdata=ageD,type=”response”)
> yE=predict(reg,newdata=ageE,type=”response”)
> lines(ageD\$ageconducteur,yD,col=”blue”,lwd=2)
> lines(ageE\$ageconducteur,yE,col=”red”,lwd=2)

Here, the ratio

is constant for young people, while it will change for older ones,

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: , , , , , , , , , ,