Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.
It made me think about my own efforts to communicate future uncertainty through graphics. Of course, for time series forecasts I normally show prediction intervals. I prefer to use more than one interval at a time because it helps convey a little more information. The default in the forecast package for R is to show both an 80% and a 95% interval like this:
The above three examples are easily produced using the forecast package:
fit <- ets(hsales) plot(forecast(fit),include=120) plot(forecast(fit,level=c(50,95)),include=120) plot(forecast(fit,fan=TRUE),include=120) |
For multi-modal distributions I like to use highest density regions. Here is an example applied to Nicholson’s blowfly data using a threshold model:
The dark region has 50% coverage and the light region has 95% coverage. The forecast distributions become bimodal after the first ten iterations, and so the 50% region is split in two to show that. This graphic was taken from a J. Forecasting paper I wrote in 1996, so these ideas have been around for a while!
It is easy enough to produce forecast HDR with time series. Here is some R code to do it:
#HDR for time series object
# Assumes that forecasts can be computed and futures simulated from object
forecasthdr <- function(object, h = ifelse(object$m > 1, 2 * object$m, 10),
nsim=2000, plot=TRUE, level=c(50,95), xlim=NULL, ylim=NULL, ...)
{
require(hdrcde)
# Compute forecasts
fc <- forecast(object)
ft <- time(fc$mean)
# Simulate future sample paths
sim <- matrix(0,nrow=h,ncol=nsim)
for(i in 1:nsim)
sim[,i] <- simulate(object, nsim=h)
# Compute HDRs
nl <- length(level)
hd <- array(0, c(h,nl,10))
mode <- numeric(h)
for(k in 1:h)
{
hdtmp <- hdr(sim[k,], prob=level)
hd[k,,1:ncol(hdtmp$hdr)] <- hdtmp$hdr
mode[k] <- hdtmp$mode
}
# Remove unnecessary sections of HDRs
nz <- apply(abs(hd),3,sum) > 0
hd <- hd[,,nz]
dimnames(hd)[[1]] <- 1:h
dimnames(hd)[[2]] <- level
# Produce plot if required
if(plot)
{
if(is.null(xlim))
xlim <- range(time(object$x),ft)
if(is.null(ylim))
ylim <- range(object$x, hd)
plot(object$x,xlim=xlim, ylim=ylim, ...)
# Show HDRs
cols <- rev(colorspace::sequential_hcl(52))[level - 49]
for(k in 1:h)
{
for(j in 1:nl)
{
hdtmp <- hd[k,j,]
nint <- length(hdtmp)/2
for(l in 1:nint)
{
polygon(ft[k]+c(-1,-1,1,1)/object$m/2,
c(hdtmp[2*l-1],hdtmp[2*l],hdtmp[2*l],hdtmp[2*l-1]),
col=cols[j], border=FALSE)
}
}
points(ft[k], mode[k], pch=19, col="blue",cex=0.8)
}
#lines(fc$mean,col='blue',lwd=2)
}
# Return HDRs
return(list(hdr=hd,mode=mode,level=level))
} |
We can apply it using the example I started with:
z <- forecasthdr(fit,xlim=c(1986,1998),nsim=5000,
xlab="Year",ylab="US monthly housing sales") |
The dots are modes of the forecast distributions, and the 50% and 95% highest density regions are also shown. In this case, the distributions are unimodal, and so all the regions are intervals.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
