The Uncertainty of Predictions

October 2, 2013
By

(This article was first published on Statistical Research » R, and kindly contributed to R-bloggers)

There are many kinds of intervals in statistics.  To name a few of the common intervals: confidence intervals, prediction intervals, credible intervals, and tolerance intervals. Each are useful and serve their own purpose.

I’ve been recently working on a couple of projects that involve making predictions from a regression model and I’ve been doing some work on estimation and the variability of the estimates. Recently, in my case, I need to predict an individual outcome drawn from the distribution (this differs from estimating the mean of that distribution). Often I end up estimating several simultaneous individual outcomes. It follows that we need to assume the underlying regression model still applies and is appropriate for this new individual observation.

Prediction vs. Confidence Intervals

Prediction Intervals and Confidence Intervals

Prediction intervals are useful in that it gives us an understanding of likely values that an individual observation may be.  There is a subtle, but big, difference between prediction intervals and confidence intervals.  With a confidence interval we are trying to obtain an upper and lower limit for the parameter.  For a prediction interval we are interested in the upper and lower limits on an individual observation.  One key item to note is that prediction intervals will always be wider than the corresponding confidence intervals.  The reason for this is that when making a new prediction on an observation ( \hat{y_{i}} ) we encounter the variability in the observation estimate as well as variation within the probability distribution of y.

These intervals often get confused with confidence intervals (which those alone have many nuances). But they really need to be kept separate because calling a prediction interval a confidence interval (or vice versa) will provide a false understanding of the variability.

Doing Many Prediction Intervals

Sometimes it is necessary to estimate several new observations.  In this case we need an overall family confidence level of (1-\alpha). To compensate for simultaneously estimating many observations there are a couple of difference procedures: Scheffé and Bonferroni.  It seems, based on various conferences that I’ve attended, that Bonferroni is most common, probably because it is not only conservative in it’s application but because it is so simple to implement.

Bayesian Posterior Predictive Distribution

It’s worth mentioning the posterior predictive distribution using a Bayesian approach.  In Bayesian statistics the parameter is treated as random and often unknown (as compared to frequentists where it’s fixed and often unknown).   So here we can use the posterior predictive distribution for new data. It also happens to be a good way to see how well the model performed.

Posterior Prediction Distribution

Example Code

library(LearnBayes)
alpha = 0.05
y = rnorm(100,0, 1)
x1= y+rnorm(100,1,1)
x2= y+rnorm(100,2,2)

##### Least-squares fit
fit=lm(y~x1+x2,x=TRUE,y=TRUE)

new = data.frame(x1 = seq(-5, 5, 0.5), x2=seq(-3, 7, 0.5))

pred.w.plim = predict(fit, new, interval = "prediction")
pred.w.clim = predict(fit, new, interval = "confidence")
matplot(new$x1, cbind(pred.w.clim, pred.w.plim[,-1]),
        lty = c(1,2,2,3,3), col=c(1,2,2,3,3), type = "l", 
        ylab = "predicted y", xlab= "x1", main="Confidence vs. Prediction Intervals")
legend("topleft", c("Confidence","Prediction"), lwd=2, lty=c(2,3), col=c(2,3), cex=.7)

##### Sampling from posterior
theta.sample=blinreg(fit$y,fit$x,5000)


######### Model checking via posterior predictive distribution
pred.draws=blinregpred(fit$x,theta.sample)
pred.sum=apply(pred.draws,2,quantile,c(alpha/2,1-alpha/2))

par(mfrow=c(1,1))
ind=1:ncol( pred.sum )
ind.odd=seq(1,ncol( pred.sum ), by=2)
ind.even=seq(2,ncol( pred.sum ), by=2)
matplot(rbind(ind,ind),pred.sum,type="l",lty=1,col=1,
        xlab="Precinct Identifier",ylab="Candidate Vote", main="Posterior Predictive Distribution"
        , xaxt='n')
axis(side=1, at=ind.odd, tcl = -1.0, lty = 1, lwd = 0.5, labels=ind.odd, cex.axis=.75)
axis(side=1, at=ind.even, tcl = -0.7, lty = 1, lwd = 0.5, labels=rep("",length(ind.even)), cex.axis=.75)
points(ind,y,pch=19, cex=.4)

out=(y>pred.sum[2,] | y<pred.sum[1,])
text(ind[out], y[out], label=ind[out], pos = 1, cex=1, col=2)

To leave a comment for the author, please follow the link and comment on his blog: Statistical Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.