# Inference for AR(p) Time Series

**Freakonometrics » R-english**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Consider a (stationary) autoregressive process, say of order 2,

for some white noise with variance . Here is a code to generate such a process,

> phi1=.25 > phi2=.7 > n=1000 > set.seed(1) > e=rnorm(n) > Z=rep(0,n) > for(t in 3:n) Z[t]=phi1*Z[t-1]+phi2*Z[t-2]+e[t] > Z=Z[800:1000] > n=length(Z) > plot(Z,type="l")

Here, we have to estimate two sets of parameters: the autoregressive coefficients, and the variance of the innovation process . Several techniques can be used to estimate those parameters.

- using
**least square regression**

A natural idea is to see here a regression model, since (if we consider a matrix formulation)

Here we can run (conditional) ordinary least squares estimation,

> base=data.frame(Y=Z[3:n],X1=Z[2:(n-1)],X2=Z[1:(n-2)]) > regression=lm(Y~0+X1+X2,data=base) > summary(regression) Call: lm(formula = Y ~ 0 + X1 + X2, data = base) Residuals: Min 1Q Median 3Q Max -3.0268 -0.7063 0.1065 0.6925 3.2566 Coefficients: Estimate Std. Error t value Pr(>|t|) X1 0.23400 0.05463 4.283 2.88e-05 *** X2 0.62863 0.05476 11.479 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.062 on 197 degrees of freedom Multiple R-squared: 0.6349, Adjusted R-squared: 0.6312 F-statistic: 171.3 on 2 and 197 DF, p-value: < 2.2e-16

so we get the following estimators, for the autocorrelation coefficients, and the volatility of the noise

> regression$coefficients X1 X2 0.2339959 0.6286321 > summary(regression)$sigma [1] 1.061839

- using
**Yule-Walker equations**

As we’ve seen in class, we can easily get the following equations for the autocovariance functions,

which can also be written (again, using a matrix expression)

So we just have to solve a simple linear system of equations. Note that if we divide by the variance, those equations can be written in terms of the autocorrelation functions

The code is the following

> rho1=cor(Z[1:(n-1)],Z[2:n]) > rho2=cor(Z[1:(n-2)],Z[3:n]) > A=matrix(c(1,rho1,rho1,1),2,2) > b=matrix(c(rho1,rho2),2,1) > (PHI=solve(A,b)) [,1] [1,] 0.2256270 [2,] 0.6315329

Now, we need to extract the estimated innovation process, from this set of parameters

> estWN=base$Y-(PHI[1]*base$X1+PHI[2]*base$X2) > sd(estWN) [1] 1.058558

This estimator is probably not the best one (we can take into account that we’ve lost two degrees of freedom), but as a starting point, let us consider this one.

An alternative could be to include the variance term in Yule-Walker equations, to get a three dimensional linear equation,

It is not much more complicated to solve, actually,

> gamma0=var(Z[1:n]) > gamma1=var(Z[1:(n-1)],Z[2:n]) > gamma2=var(Z[1:(n-2)],Z[3:n]) > A=matrix(c(gamma1,gamma0,gamma1,gamma2,gamma1,gamma0,1,0,0),3,3) > b=matrix(c(gamma0,gamma1,gamma2),3,1) > (PHISIGMA=solve(A,b)) [,1] [1,] 0.2283151 [2,] 0.6283431 [3,] 1.1335501

- using
**(conditional) likelihood estimators**

Finally, we can assume some distribution for the innovation process. The standard model is a Gaussian model, i.e.

has a Gaussian distribution

In that case, the conditional log likelihood (conditional since we set the first two observations here) is

> CondLogLik=function(A,TS){ + phi1=A[1]; phi2=A[2] + sigma=A[3]; L=0 + for(t in 3:length(TS)){ + L=L+dnorm(TS[t],mean=phi1*TS[t-1]+ + phi2*TS[t-2],sd=sigma,log=TRUE)} + return(-L)}

Now, we can run standard optimization procedures,

> LogL=function(A) CondLogLik(A,TS=Z) > optim(c(0,0,1),LogL) $par [1] 0.2339589 0.6285002 1.0565613 $value [1] 293.3042 $counts function gradient 106 NA $convergence [1] 0 $message NULL

It is also possible to consider a global maximum likelihood optimisation problem, since the variance matrix of vector has a know form.

- using
**(unconditional) likelihood estimators**

The variance matrix of is , where autocovariances are not not know, be can easily be computed using a recursive relationship.

> library(mnormt) > GlobalLogLik=function(A,TS){ + n=length(TS) + phi1=A[1]; phi2=A[2] + sigma=A[3] + SIG=matrix(0,n,n) + rho=rep(0,n) + rho[1]=1 + rho[2]=phi1/(1-phi2) + for(h in 3:n) rho[h]=phi1*rho[h-1]+phi2*rho[h-2] + for(i in 1:n){for(j in 1:n){ + SIG[i,j]=rho[abs(i-j)+1]}} + gamma0=(1-phi2)*sigma^2/((1+phi2)*((1-phi2)^2-phi1^2)) + SIG=gamma0*SIG + return(dmnorm(TS,rep(0,n),SIG,log=TRUE))} > LogL=function(A) -GlobalLogLik(A,TS=Z) > optim(c(.1,.1,1),LogL) Error in chol.default(x, pivot = FALSE) : Error in pd.solve(varcov, log.det = TRUE) : x appears to be not positive definite

The problem is that there is a strong constraint on the pair to get a stationary process (we are not far away, here, from the border of the triangle, where the process become non stationary). To be more specific (this was mentioned in a previous post), we should have

i.e. in a standard matrix form

(we can add an additional constraint on the variance parameter, to insure that it will be positive). To run a contrained optimization routine, consider

> U=matrix(c(1,0,0,-1,0,1,0,-1,0,0,1,0),4,3) > C=c(0,0,0,-.99999) > constrOptim(c(.1,.1,1),LogL,grad=NULL,ui=U,ci=C) $par [1] 0.2238892 0.6342850 1.0613388 $value [1] 297.9202 $counts function gradient 108 NA $convergence [1] 0 $message NULL $outer.iterations [1] 2 $barrier.value [1] 0.000189892

(here, to faster, we restrain the parameters so that they will be positive).

- comparing those estimates

Here, our five estimators are rather close. Let us run more samples to see more precisely how they behave. For the first parameter , we get

and for the second one, , we have

The bias we observe is probably coming from the fact that, with this numerical example, we are not far away from the non-stationary case (the sum of the *true* parameters should be less than 1, and it is 0.95). When we estimate the parameters, we force them to be *inside* the triangle, since those parameters can be estimated *only* if the process is stationary.

Observe that the standard-deviation of the innovation process is here, well estimated,

(with clearly some estimators that perform better than others).

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics » R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.