Down and Dirty Forecasting: Part 2

May 24, 2013

(This article was first published on OutLie..R, and kindly contributed to R-bloggers)

This is the second part of the forecasting exercise, where I am looking at a multiple regression. To keep it simple I chose the states that boarder WI and the US unemployment information for the regression. Again this is a down and dirty analysis, I would not call this complete in any sense.

#getting the data for WI, US, IL, MI, MN, IO
#Using Quandl data, great little site
wi<-read.csv('', colClasses=c('Date'='Date'))
us<-read.csv('', colClasses=c('Date'='Date'))
il<-read.csv('', colClasses=c('Date'='Date'))
mi<-read.csv('', colClasses=c('Date'='Date'))
mn<-read.csv('', colClasses=c('Date'='Date'))
io<-read.csv('', colClasses=c('Date'='Date'))
#merging the data into one dataframe
#I started with WI so that I could isolate the rest of the variables better
unemp<-merge(wi, io, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io')
unemp<-merge(unemp, mi, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io', 'mi')
unemp<-merge(unemp, mn, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io', 'mi', 'mn')
unemp<-merge(unemp, us, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io', 'mi', 'mn', 'us')
unemp<-merge(unemp, il, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io', 'mi', 'mn', 'us', 'il')
save(unemp, file='Unemployment WI.RData')
load('Unemployment WI.RData')
plot(unemp$Date, unemp$us, col='black', 
main='Unemployment WI Region', type='l', xlab='Date', 
ylab='Unemp. Rate', ylim=c(2, 17))
lines(unemp$Date, unemp$il, col='blue')
lines(unemp$Date, unemp$io, col='dark green')
lines(unemp$Date, unemp$mi, col='purple')
lines(unemp$Date, unemp$mn, col='gray')
lines(unemp$Date, unemp$wi, col='red')
leg.txt<-c('US', 'IL', 'IO', 'MI', 'MN', "WI")
leg.col<-c('black','blue', 'dark green', 'purple', 'gray', 'red')
legend('topleft', leg.txt, col=leg.col, pch=15)
#correlation matrix
#From the graph and correlation matrix, everything 
#is highly correlated, the lowest r value is 0.7879 
#between Iowa and US.
#plots of each of the data, then each individual, 
#everything looks really good, nice linear trends, 
#although Michigan has quite bit of volatility 
#compared to the other variables.
plot(unemp.b, pch='.')
model<-lm(wi~., data=unemp.b[,-1])
step.m<-step(model, direction='backward')
#testing for outliers
# After conducting the outlier test, looks like it is michigan
#Sept 1980 where unemployment was 13.2 percent, point 57 is
#close at 12.9 those are the two biggest, but the outlierTest
#isolated row 56
dwtest(step.m, alternative='two.sided')
#yep lots of autocorrelation, so a multiple ARAIMA might be in order
mult.f<-predict(step.m, newdata=pred, interval='prediction')
a1<-accuracy(mult.f, unemp.p$wi)
#because the MASE is missing I needed to add the following to compare later
names(a1)<-c("ME", "RMSE", "MAE", "MPE", "MAPE", "MASE")
auto.r<-auto.arima(unemp.b$wi, xreg=unemp.b[,3:7])
auto.f<-forecast(auto.r, xreg=unemp.p[,3:7], h=12)
a2<-accuracy(auto.f, unemp.p$wi)
#Just remember that the 0 is just a place holder
r.table<-rbind(a1, a2)
row.names(r.table)<-c('Multiple Regression', 'Reg. with ARIMA Errors')
At this point I combined the two accuracy tables to see which one was the best.

b.table<-rbind(a.table[,1:6], r.table)

Created by Pretty R at

The winner is: Regression with ARIMA Errors, for most of the analysis the simple mean like predictions dominated, even the multiple regression is considerably low on the list.

To leave a comment for the author, please follow the link and comment on their blog: OutLie..R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)