Inference for MA(q) Time Series

January 29, 2014

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Yesterday, we’ve seen how inference for time series was possible.  I started  with that one because it is actually the simple case. For instance, we can use ordinary least squares. There might be some possible bias (see e.g. White (1961)), but asymptotically, estimators are fine (consistent, with asymptotic normality). But when the noise is (auto)correlated, then it is more complex. So, consider here some  time series

for some white noise\varepsilon_t).

> theta1=.25
> theta2=.7
> n=1000
> set.seed(1)
> e=rnorm(n)
> Z=rep(0,n)
> for(t in 3:n) Z[t]=e[t]+theta1*e[t-1]+theta2*e[t-2]
> Z=Z[800:1000]
> plot(Z,type="l")

  • Using the empirical autocorrelations

The first idea might be to use the first two (empirical) autocorrelations (the two that are supposed to be – theoretically – non null).

avec  pour . We also have the following relationship on the variance of the process

With those three equations, for three unknown parameters,\theta_1\theta_2 and\sigma, we simply have to solve (numerically) that system of equations,

> v=c(as.numeric(acf(Z)$acf[2:3]),var(Z))
> v
[1] 0.1658760 0.3823053 1.6379498
> library(rootSolve)
> seteq=function(x){
+ F1=v[1]-(x[1]+x[1]*x[2])/(1+x[1]^2+x[2]^2)
+ F2=v[2]-(x[2])/(1+x[1]^2+x[2]^2)
+ F3=v[3]-(1+x[1]^2+x[2]^2)*x[3]^2
+ return(c(F1,F2,F3))}
> multiroot(f=seteq,start=c(.1,.1,1))
[1] 0.1400579 0.4766699 1.1461636

[1]  7.876355e-10  4.188458e-09 -2.839977e-09

[1] 5

[1] 2.605357e-09

We are a bit far away from the true values, used to generate our sample. And if we consider 1,000 sample (instead of only one), we still have the bias, and a large variance for our three estimators,

  • Using least square techniques

We can try something quite different here. The problem we have is that we do not observe the noise\varepsilon_t), we only observe our series But we can try to rebuild that series (call it  since we’re not sure it will be a reconstruction of the noise). As suggested in Box & Jenkins (1967), assume that the first two values are null. And then, use

and then, we can use least square techniques

The code will be

> V=function(p){
+ theta1=p[1]
+ theta2=p[2]
+ u=rep(0,length(Z))
+ for(t in 3:length(Z)) u[t]=Z[t]-theta1*u[t-1]-theta2*u[t-2]
+ return(sum(u^2))
+ }

If we try to minimize the sum of the squares of the residuals, we get

> optim(par=c(.1,.1),V)
[1] 0.2751667 0.6723909

[1] 225.8104

function gradient 
      77       NA 

[1] 0


which is close to the true value. Another good thing is that, if we compare that rebuilt noise with the true one (since we actually have it), then we have the same vector,

> plot(e[800:1000],col="blue",type="l")
> theta1=0.2751667
> theta2=0.6723909
> u=rep(0,length(Z))
> for(t in 3:length(Z)) u[t]=Z[t]-theta1*u[t-1]-theta2*u[t-2]
> lines(1:201,u,col="red")

So far, so good. And if we look at 1,000 samples, we get

It looks like we have some bias here. And since the two estimators should be negatively correlated, one over-estimates, while the other one under-estimates.

  • Using the (global) maximum likelihood technique

And a final method might be to use the maximum likelihood technique (globally). Again, if we assume that we have a Gaussian i.i.d noise, then the vector\boldsymbol{Y}=(Y_1,\cdots,Y_t) is Gaussian, with a simple variance matrix (since a lot of elements will be null),

> library(mnormt)
> GlobalLogLik=function(A,TS){
+ n=length(TS)
+ theta1=A[1];  theta2=A[2]
+ sigma=A[3]
+ SIG=matrix(0,n,n)
+ rho=rep(0,n)
+ rho[1]=1
+ rho[2]=(theta1+theta1*theta2)/(1+theta1^2+theta2^2)
+ rho[3]=(theta2)/(1+theta1^2+theta2^2)
+ for(i in 1:n){for(j in 1:n){
+ SIG[i,j]=rho[abs(i-j)+1]}}
+ gamma0=(1+theta1^2+theta2^2)*sigma^2
+ SIG=gamma0*SIG
+ return(dmnorm(TS,rep(0,n),SIG,log=TRUE))}
> LogL=function(A) -GlobalLogLik(A,TS=Z)
> optim(c(.1,.1,1),LogL)
[1] 0.2584144 0.6826530 1.0669820

[1] 298.8699

function gradient 
      86       NA 

[1] 0


Here, the values that minimize the likelihood are rather close to the ones used to generate our sample. And if we run this algorithm on 1,000 samples, we can see that those estimates are fine,

I could not find other ideas, to estimate those parameters. I guess we can use the partial autocorrelation function, since we have relationships that can be related to Yule-Walker equations for time series.

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)