Confidence interval for predictions with GLMs

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Consider a (simple) Poisson regression http://freakonometrics.blog.free.fr/public/latex/poiss01.gif. Given a sample http://freakonometrics.blog.free.fr/public/latex/poiss02.gif where http://freakonometrics.blog.free.fr/public/latex/poiss03.gif, the goal is to derive a 95% confidence interval for http://freakonometrics.blog.free.fr/public/latex/poiss04.gif given http://freakonometrics.blog.free.fr/public/latex/poiss05.gif, where http://freakonometrics.blog.free.fr/public/latex/poiss04.gif is the prediction. Hence, we want to derive a confidence interval for the prediction, not the potential observation, i.e. the dot on the graph below
> r=glm(dist~speed,data=cars,family=poisson)
> P=predict(r,type="response",
+ newdata=data.frame(speed=seq(-1,35,by=.2)))
> plot(cars,xlim=c(0,31),ylim=c(0,170))
> abline(v=30,lty=2)
> lines(seq(-1,35,by=.2),P,lwd=2,col="red")
> P0=predict(r,type="response",se.fit=TRUE,
+ newdata=data.frame(speed=30))
> points(30,P1$fit,pch=4,lwd=3)
i.e.


Let http://freakonometrics.blog.free.fr/public/latex/poiss06.gif denote the maximum likelihood estimator of http://freakonometrics.blog.free.fr/public/latex/poiss07.gif. Then
http://freakonometrics.blog.free.fr/public/latex/poiss40.gif

where http://freakonometrics.blog.free.fr/public/latex/poiss101.gif is Fisher information of http://freakonometrics.blog.free.fr/public/latex/poiss06.gif (from standard maximum likelihood theory). Recall that
http://freakonometrics.blog.free.fr/public/latex/poiss13.gif

where computation of those values is based on the following calculations
http://freakonometrics.blog.free.fr/public/latex/poiss21.gif

In the case of the log-Poisson regression
http://freakonometrics.blog.free.fr/public/latex/poiss36.gif

Let us get back to our initial problem.
  • confidence interval for the linear combination
A first idea to get a confidence interval for http://freakonometrics.blog.free.fr/public/latex/poiss49.gif is to get a confidence interval for http://freakonometrics.blog.free.fr/public/latex/poiss100.gif (by taking exponential values of bounds, since the exponential is a monotone function). Asymptotically, we know that
http://freakonometrics.blog.free.fr/public/latex/poiss40.gif
thus, an approximation for the variance matrix of http://freakonometrics.blog.free.fr/public/latex/poiss06.gif will be based on http://freakonometrics.blog.free.fr/public/latex/poiss45.gif, obtained by plugging estimators of the parameters.
Then, since http://freakonometrics.blog.free.fr/public/latex/poiss06.gif as an asymptotic multivariate distribution, any linear combination of the parameters will also be normal, i.e.
http://freakonometrics.blog.free.fr/public/latex/poiss47.gif has a normal distribution, centered on http://freakonometrics.blog.free.fr/public/latex/poiss49.gif, with variance http://freakonometrics.blog.free.fr/public/latex/poiss102.gif where http://freakonometrics.blog.free.fr/public/latex/Poiss110.gif is the variance of http://freakonometrics.blog.free.fr/public/latex/poiss06.gif. All those quantities can be easily computed. First, we can get the variance of the estimators
> i1=sum(predict(reg,type="response"))
> i2=sum(cars$speed*predict(reg,type="response"))
> i3=sum(cars$speed^2*predict(reg,type="response"))
> I=matrix(c(i1,i2,i2,i3),2,2)
> V=solve(I)
Hence, if we compare with the output of the regression,
> summary(reg)$cov.unscaled
(Intercept)         speed
(Intercept)  0.0066870446 -3.474479e-04
speed       -0.0003474479  1.940302e-05
> V
[,1]          [,2]
[1,]  0.0066871228 -3.474515e-04
[2,] -0.0003474515  1.940318e-05
Based on those values, it is easy to derive the standard deviation for the linear combination,
> x=30
> P2=predict(r,type="link",se.fit=TRUE,
+ newdata=data.frame(speed=x))
> P2
$fit
1
5.046034
 
$se.fit
[1] 0.05747075
 
$residual.scale
[1] 1
 
>
> sqrt(V[1,1]+2*x*V[2,1]+x^2*V[2,2])
[1] 0.05747084
> sqrt(t(c(1,x))%*%V%*%c(1,x))
[,1]
[1,] 0.05747084
And once we have the standard deviation, and normality (at least asymptotically), confidence intervals are derived, and then, taking the exponential of the bounds, we get confidence interval
> segments(30,exp(P2$fit-1.96*P2$se.fit),
+ 30,exp(P2$fit+1.96*P2$se.fit),col="blue",lwd=3)
Based on that technique, confidence intervals are not centered on the prediction, but who cares ?

  • delta method
Actually, those who like to use “more or less” expressions for confidence intervals will not like non centered intervals. So, an alternative is to use the delta method. Instead of writing (again) something on the theory, we can use a package which computes that method,
> estmean=t(c(1,x))%*%coef(reg)
> var=t(c(1,x))%*%summary(reg)$cov.unscaled%*%c(1,x)
> library(msm)
> deltamethod (~ exp(x1), estmean, var)
[1] 8.931232
> P1=predict(r,type="response",se.fit=TRUE,
+ newdata=data.frame(speed=30))
> P1
$fit
1
155.4048
 
$se.fit
1
8.931232
 
$residual.scale
[1] 1
The delta method gives us (asymptotic) normality, so once we have a standard deviation, we get the confidence interval.
> segments(30,P1$fit-1.96*P1$se.fit,30,
+ P1$fit+1.96*P1$se.fit,col="blue",lwd=3)

Note that those quantities – obtained with two different approaches – are rather close here

> exp(P2$fit-1.96*P2$se.fit)
1
138.8495
> P1$fit-1.96*P1$se.fit
1
137.8996
> exp(P2$fit+1.96*P2$se.fit)
1
173.9341
> P1$fit+1.96*P1$se.fit
1
172.9101 
  • bootstrap techniques
And a third method (but far from what I expect to teach on that course) is to use bootstrap techniques to about those results based on asymptotic normality (we have only 50 observations). The idea is to sample from out dataset, and to run a log-Poisson regression on those new samples, and to repeat a lot of time,

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)