Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This morning, in our Time Series course, we’ve been playing with some data I got from google.ca/trends/. Actually, we’ve been playing on some old version, downloaded 18 months ago (discussed in a previous post, in French).

> tail(report)
580 2015-02-08 - 2015-02-14         53
581 2015-02-15 - 2015-02-21         52
582 2015-02-22 - 2015-02-28         51
583 2015-03-01 - 2015-03-07         50
584 2015-03-08 - 2015-03-14         49
585 2015-03-15 - 2015-03-21         49

If we plot that weekly time series, we have

> plot(report[,2],type="l")

Working with weekly series is more complicated (at least to find a simple model, with only a few lags), so let us convert that series into a monthly one,

> source(
+   "http://freakonometrics.blog.free.fr/public/code/H2M.R")

Here, we did three things,

• we considered the logarithm of the series, because of the increasing variability we have on recent years,
• we removed (too) old observations
• we kept recent observations to test our models.
> T=as.numeric(time(X))
> idx=which((T>2006)&(T<2014.3))
> T=T[idx]
> X=as.numeric(X)[idx]
> plot(T,X,type="l")
> df=data.frame(X,T)
> reg=lm(X~T,data=df)
> abline(reg,col="red")
> Y=residuals(reg)

The residuals obtained after fitting that linear model will be our (supposed to be) stationnary time series. Let us consider various models.

• model 1

We obtained our first model be playing with the orders, this morning. Here, the model is slightly more complex,

> fit1=arima(Y,
+   order = c(1, 0, 10),
+   seasonal = list(order = c(1, 1, 0),
+   period = 12))
> fit1

Call:
arima(x = Y, order = c(1, 0, 10), seasonal = list(order = c(1, 1, 0), period = 12))

Coefficients:
ar1     ma1    ma2     ma3    ma4
0.930  -0.033  0.078  -0.101  0.365
s.e.  0.064   0.102  0.132   0.119  0.152
ma5    ma6     ma7    ma8    ma9   ma10
-0.217  0.216  -0.137  0.134  0.048  0.465
0.154  0.187   0.166  0.173  0.153  0.154
sar1
-0.578
0.118

sigma^2 estimated as 0.00114:  log likelihood = 166,  aic = -305

If we look at the residuals, we have some white noise

We have here our first model.

• Model 2

For the second one, I was much more lazy

> library(forecast)
> auto.arima(ts(Y,start=2006,
+ frequency=12),max.p=12,max.q=12)
Series: ts(Y, start = 2006, frequency = 12)
ARIMA(0,1,0)(0,1,1)[12]

Coefficients:
sma1
-0.950
s.e.   0.672

sigma^2 estimated as 0.00124:  log likelihood=155
AIC=-306   AICc=-306   BIC=-301

> fit2=arima(Y,
+            order = c(0, 1, 0),
+            seasonal = list(order = c(0, 1, 1),
+            period = 12))
> plot(acf(residuals(fit2),lag=36),lwd=3)

Just to confirm that we actually have a white noise, here, use

> BP = function(h) Box.test(residuals(fit2),lag=h,
type='Box-Pierce')$p.value > plot(1:24,Vectorize(BP)(1:24),type='b', + ylim=c(0,1),col="red") > abline(h=.05,lty=2,col="blue") One more time, we have a white noise. So this model here is valid. • Model 3 > fit3=arima(Y, + order = c(12, 0, 0), + seasonal = list(order = c(0, 1, 1), + period = 12)) > fit3 Call: arima(x = Y, order = c(12, 0, 0), seasonal = list(order = c(0, 1, 1), period = 12)) Coefficients: ar1 ar2 ar3 ar4 ar5 ar6 0.918 -0.044 0.014 0.253 -0.366 0.347 s.e. 0.112 0.148 0.145 0.135 0.142 0.151 ar7 ar8 ar9 ar10 ar11 -0.205 -0.030 0.395 -0.208 0.116 s.e. 0.152 0.139 0.134 0.162 0.169 ar12 sma1 -0.249 -0.767 s.e. 0.110 0.164 Again, we go a model similar, this morning, simply by playing with the orders. And one more time, we get a white noise, • Model 4 For the last model, we can consider some exponential smoothing model > HW = HoltWinters(log(headphones)) (again, on the logarithm of the series). • Forecasting For the first three models, use > Y1=predict(fit1,n.ahead=36) > Y2=predict(fit2,n.ahead=36) > Y3=predict(fit3,n.ahead=36) To visualise the three forecast, use, e.g. > draw=function(fit=Y1){ + straight_line=predict(reg, + newdata=data.frame(T=Tfutur)) + mth=straight_line+fit$pred
+ sth2=fit$se^2 + Xmy=exp(mth+sth2/2) + qinf=qlnorm(.05,mth,sdlog=sqrt(sth2)) + qsup=qlnorm(.95,mth,sqrt(sth2)) + + plot(headphones,type="l",xlim=c(2006,2016), + ylim=c(25,105)) + polygon(c(Tfutur,rev(Tfutur)), + c(qinf,rev(qsup)), + col="yellow",border=NA) + lines(Tfutur,Xmy,col="red") + return(Xmy) + } Here, we have > draw(Y1) Time Series: Start = 100 End = 135 Frequency = 1 [1] 45.6 46.3 47.6 52.1 52.7 52.9 68.3 [8] 88.8 58.5 51.2 49.5 47.4 47.8 49.4 [15] 51.6 54.6 54.4 55.2 73.1 95.9 64.2 [22] 56.8 54.5 52.3 53.4 54.8 56.9 61.5 [29] 62.0 62.6 81.8 106.9 71.0 62.6 60.4 [35] 57.9 > draw(Y2) > draw(Y3) And finally, for the fourth model we can use > Y4=predict(HW,n.ahead=36, + prediction.interval = TRUE) > sth2=((Y4[,"upr"]-Y4[,"fit"])/1.96)^2 > mth=Y4[,"fit"] > Xmy=exp(mth+sth2/2) > XHW=Xmy > qinf=qlnorm(.05,mth,sdlog=sqrt(sth2)) > qsup=qlnorm(.95,mth,sqrt(sth2)) > plot(headphones,type="l",xlim=c(2006,2016), + ylim=c(25,105)) > polygon(c(Tfutur,rev(Tfutur)), + c(qinf,rev(qsup)), + col="yellow",border=NA) > lines(Tfutur,Xmy,col="red") • Model comparison Here, we can compare the forecast with actual values > X=headphones > T=as.numeric(time(X)) > idx=which((T>=2014.3)) > T=T[idx] > X=as.numeric(X)[idx] > D=data.frame(T=T,X=X,X1=draw(Y1)[1:length(T)],X2=draw(Y2)[1:length(T)],X3=draw(Y3)[1:length(T)],X4=XHW[2:(1+length(T))]) > D T X X1 X2 X3 X4 1 2014 44.8 45.6 43.7 44.3 49.0 2 2014 45.2 46.3 44.3 44.9 49.6 3 2014 46.8 47.6 45.5 45.6 51.2 4 2015 49.3 52.1 47.7 48.8 53.8 5 2015 49.8 52.7 47.9 49.7 53.6 6 2015 50.9 52.9 49.0 50.8 54.4 7 2015 69.8 68.3 62.3 64.9 72.3 8 2015 91.2 88.8 82.5 85.4 92.4 9 2015 57.5 58.5 58.0 59.4 59.8 10 2015 52.5 51.2 51.9 53.6 54.9 11 2015 49.3 49.5 49.3 51.4 52.6 We can plot the errors, > barplot(D$X1-D$X,ylim=c(-7,5)) > barplot(D$X2-D$X,ylim=c(-7,5)) x > barplot(D$X3-D$X,ylim=c(-7,5)) > barplot(D$X4-D\$X,ylim=c(-7,5))

To compare the four model, compute the sum of the squares of those errors,

> L2_dist=function(var){
+   return(sum((D[,"X"]-D[,var])^2))
+ }

> sapply(paste("X",1:4,sep=""),L2_dist)
X1    X2    X3    X4
33.2 145.3  68.0 133.9

Model 2 and Model 4 were obtained using automatic routines. One on ARIMA models, the other one on the class of exponential smoothing models. The interesting point is thatthe we’ve been able to obtain two better models, just by playing a litle bit. From an initial guess, we look at the residuals, try a another one, and again another one, until the residuals series is a white noise… Somehow I like this idea, that I can still beat automatic routines, just using experience I have, now, on time series.