There are many kinds of intervals in statistics. To name a few of the common intervals: confidence intervals, prediction intervals, credible intervals, and tolerance intervals. Each are useful and serve their own purpose.
I’ve been recently working on a couple of projects that involve making predictions from a regression model and I’ve been doing some work on estimation and the variability of the estimates. Recently, in my case, I need to predict an individual outcome drawn from the distribution (this differs from estimating the mean of that distribution). Often I end up estimating several simultaneous individual outcomes. It follows that we need to assume the underlying regression model still applies and is appropriate for this new individual observation.
Prediction Intervals and Confidence Intervals
Prediction intervals are useful in that it gives us an understanding of likely values that an individual observation may be. There is a subtle, but big, difference between prediction intervals and confidence intervals. With a confidence interval we are trying to obtain an upper and lower limit for the parameter. For a prediction interval we are interested in the upper and lower limits on an individual observation. One key item to note is that prediction intervals will always be wider than the corresponding confidence intervals. The reason for this is that when making a new prediction on an observation () we encounter the variability in the observation estimate as well as variation within the probability distribution of .
These intervals often get confused with confidence intervals (which those alone have many nuances). But they really need to be kept separate because calling a prediction interval a confidence interval (or vice versa) will provide a false understanding of the variability.
Doing Many Prediction Intervals
Sometimes it is necessary to estimate several new observations. In this case we need an overall family confidence level of (1-). To compensate for simultaneously estimating many observations there are a couple of difference procedures: Scheffé and Bonferroni. It seems, based on various conferences that I’ve attended, that Bonferroni is most common, probably because it is not only conservative in it’s application but because it is so simple to implement.
Bayesian Posterior Predictive Distribution
It’s worth mentioning the posterior predictive distribution using a Bayesian approach. In Bayesian statistics the parameter is treated as random and often unknown (as compared to frequentists where it’s fixed and often unknown). So here we can use the posterior predictive distribution for new data. It also happens to be a good way to see how well the model performed.
library(LearnBayes) alpha = 0.05 y = rnorm(100,0, 1) x1= y+rnorm(100,1,1) x2= y+rnorm(100,2,2) ##### Least-squares fit fit=lm(y~x1+x2,x=TRUE,y=TRUE) new = data.frame(x1 = seq(-5, 5, 0.5), x2=seq(-3, 7, 0.5)) pred.w.plim = predict(fit, new, interval = "prediction") pred.w.clim = predict(fit, new, interval = "confidence") matplot(new$x1, cbind(pred.w.clim, pred.w.plim[,-1]), lty = c(1,2,2,3,3), col=c(1,2,2,3,3), type = "l", ylab = "predicted y", xlab= "x1", main="Confidence vs. Prediction Intervals") legend("topleft", c("Confidence","Prediction"), lwd=2, lty=c(2,3), col=c(2,3), cex=.7) ##### Sampling from posterior theta.sample=blinreg(fit$y,fit$x,5000) ######### Model checking via posterior predictive distribution pred.draws=blinregpred(fit$x,theta.sample) pred.sum=apply(pred.draws,2,quantile,c(alpha/2,1-alpha/2)) par(mfrow=c(1,1)) ind=1:ncol( pred.sum ) ind.odd=seq(1,ncol( pred.sum ), by=2) ind.even=seq(2,ncol( pred.sum ), by=2) matplot(rbind(ind,ind),pred.sum,type="l",lty=1,col=1, xlab="Precinct Identifier",ylab="Candidate Vote", main="Posterior Predictive Distribution" , xaxt='n') axis(side=1, at=ind.odd, tcl = -1.0, lty = 1, lwd = 0.5, labels=ind.odd, cex.axis=.75) axis(side=1, at=ind.even, tcl = -0.7, lty = 1, lwd = 0.5, labels=rep("",length(ind.even)), cex.axis=.75) points(ind,y,pch=19, cex=.4) out=(y>pred.sum[2,] | y