Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This week, in the MAT8181 Time Series course, we’ve discussed unit root tests. According to Wold’s theorem, if $(Y_t)$ is  (weakly) stationnary then

$Y_{t}=\sum _{{j=0}}^{\infty }\psi_{j}\varepsilon _{{t-j}}+\xi _{t}$

where $(\varepsilon _{{t}})$ is the innovation process, and where $(\xi _{{t}})$ is some deterministic series (just to get a result as general as possible). Observe that

$\sum _{{j=0}}^{{\infty }}|\psi_{{j}}|^{2} < \infty$

as discussed in a previous post. To go one step further, there is also the Beveridge-Nelson decomposition : an integrated of order one process, defined as

$\Delta Y_{t}=(1-L) Y_t=\sum _{{j=0}}^{\infty }\psi_{j}\varepsilon _{{t-j}}+\xi=\Psi(L)\varepsilon _{{t}}+\xi$can be represented as

a linear trend $+$ a random walk $+$ a stationary remaining term

i.e.

$Y_{t}=\underbrace{Y_0 + \xi t }+\underbrace{\Psi(1)\sum_{j=1}^t\varepsilon _{{i}}}+\underbrace{\tilde\Psi(L)\varepilon_0-\tilde\Psi(L)\varepsilon_t}$

where $\tilde\Psi(\cdot)$ is the polynomial with terms $\tilde\psi_j$, where

$\tilde\psi_j =\sum_{i=j+1}^\infty\psi_i$

For unit-root tests, we will use various representation of the process. In order to illustrate the implementation of those tests, consider the following series

> E=rnorm(240)
> X=cumsum(E)
> plot(X,type="l")

• Dickey Fuller (standard)

Here, for the simple version of the Dickey-Fuller test, we assume that

$Y_t=\alpha+\beta t+\varphi Y_{t-1}+\varepsilon_t$

and we would like to test if $\varphi=1$ (or not). We can write the previous representation as

$\Delta Y_t=\alpha+\beta t+[\varphi-1] Y_{t-1}+\varepsilon_t$

so we simply have to test if the regression coefficient in the linear regression is – or not – null. Which can be done with Student’s test. If we consider the previous model without the linear drift, we have to consider the following regression

> lags=0
> z=diff(X)
> n=length(z)
> z.diff=embed(z, lags+1)[,1]
> z.lag.1=X[(lags+1):n]
> summary(lm(z.diff~0+z.lag.1 ))

Call:
lm(formula = z.diff ~ 0 + z.lag.1)

Residuals:
Min       1Q   Median       3Q      Max
-2.84466 -0.55723 -0.00494  0.63816  2.54352

Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1 -0.005609   0.007319  -0.766    0.444

Residual standard error: 0.963 on 238 degrees of freedom
Multiple R-squared:  0.002461,	Adjusted R-squared:  -0.00173
F-statistic: 0.5873 on 1 and 238 DF,  p-value: 0.4442

Our testing procedure will be based on the Student’s t value,

> summary(lm(z.diff~0+z.lag.1 ))$coefficients[1,3] [1] -0.7663308 which is exactly the value computed using > library(urca) > df=ur.df(X,type="none",lags=0) > df ############################################################### # Augmented Dickey-Fuller Test Unit Root / Cointegration Test # ############################################################### The value of the test statistic is: -0.7663 The interpretation of this value can be done using critical values (99%, 95%, 90%) > qnorm(c(.01,.05,.1)/2) [1] -2.575829 -1.959964 -1.644854 If the statistics exceeds those values, then the series is not stationnary, since we cannot reject the assumption that $\varphi-1=0$. So we might conclude that there is a unit root. Actually, those critical values are obtained using > summary(df) ############################################### # Augmented Dickey-Fuller Test Unit Root Test # ############################################### Test regression none Call: lm(formula = z.diff ~ z.lag.1 - 1) Residuals: Min 1Q Median 3Q Max -2.84466 -0.55723 -0.00494 0.63816 2.54352 Coefficients: Estimate Std. Error t value Pr(>|t|) z.lag.1 -0.005609 0.007319 -0.766 0.444 Residual standard error: 0.963 on 238 degrees of freedom Multiple R-squared: 0.002461, Adjusted R-squared: -0.00173 F-statistic: 0.5873 on 1 and 238 DF, p-value: 0.4442 Value of test-statistic is: -0.7663 Critical values for test statistics: 1pct 5pct 10pct tau1 -2.58 -1.95 -1.62 The problem with R is that there are several packages that can be used for unit root tests. Just to mention another one, > library(tseries) > adf.test(X,k=0) Augmented Dickey-Fuller Test data: X Dickey-Fuller = -2.0433, Lag order = 0, p-value = 0.5576 alternative hypothesis: stationary We do have here also a test where the null hypothesis is that there is a unit root. But the p-value is quite different. What is odd is that we have > 1-adf.test(X,k=0)$p.value
[1] 0.4423705
> [email protected]$coefficients[4] [1] 0.4442389 (but I think it is a coincidence). • Augmented Dickey Fuller It is possible to had some lags in the regression. For instance, we can consider $\Delta Y_t=\alpha+\beta t+[\varphi-1] Y_{t-1}+\psi \Delta Y_{t-1}+\varepsilon_t$ Again, we have to check if one coefficient is null, or not. And this can be done using Student’s t test. > lags=1 > z=diff(X) > n=length(z) > z.diff=embed(z, lags+1)[,1] > z.lag.1=X[(lags+1):n] > k=lags+1 > z.diff.lag = embed(z, lags+1)[, 2:k] > summary(lm(z.diff~0+z.lag.1+z.diff.lag )) Call: lm(formula = z.diff ~ 0 + z.lag.1 + z.diff.lag) Residuals: Min 1Q Median 3Q Max -2.87492 -0.53977 -0.00688 0.64481 2.47556 Coefficients: Estimate Std. Error t value Pr(>|t|) z.lag.1 -0.005394 0.007361 -0.733 0.464 z.diff.lag -0.028972 0.065113 -0.445 0.657 Residual standard error: 0.9666 on 236 degrees of freedom Multiple R-squared: 0.003292, Adjusted R-squared: -0.005155 F-statistic: 0.3898 on 2 and 236 DF, p-value: 0.6777 > summary(lm(z.diff~0+z.lag.1+z.diff.lag ))$coefficients[1,3]
[1] -0.7328138

This value is the one obtained using

> df=ur.df(X,type="none",lags=1)
> summary(df)

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################

Test regression none

Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)

Residuals:
Min       1Q   Median       3Q      Max
-2.87492 -0.53977 -0.00688  0.64481  2.47556

Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1    -0.005394   0.007361  -0.733    0.464
z.diff.lag -0.028972   0.065113  -0.445    0.657

Residual standard error: 0.9666 on 236 degrees of freedom
Multiple R-squared:  0.003292,	Adjusted R-squared:  -0.005155
F-statistic: 0.3898 on 2 and 236 DF,  p-value: 0.6777

Value of test-statistic is: -0.7328

Critical values for test statistics:
1pct  5pct 10pct
tau1 -2.58 -1.95 -1.62

And again, other pckages can be used:

> adf.test(X,k=1)

Augmented Dickey-Fuller Test

data:  X
Dickey-Fuller = -1.9828, Lag order = 1, p-value = 0.5831
alternative hypothesis: stationary

Hopefully, the conclusion is the same (we should reject the assumption that the series is stationary, but I am not sure about the computation of the p-value).

• Augmented Dickey Fuller with trend and drift

So far, we have not included the drift in our model. But this is simple to do (this will be called the augmented version of the previous procedure): we just have to include a constant in the regression,

> summary(lm(z.diff~1+z.lag.1+z.diff.lag ))

Call:
lm(formula = z.diff ~ 1 + z.lag.1 + z.diff.lag)

Residuals:
Min       1Q   Median       3Q      Max
-2.91930 -0.56731 -0.00548  0.62932  2.45178

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.29175    0.13153   2.218   0.0275 *
z.lag.1     -0.03559    0.01545  -2.304   0.0221 *
z.diff.lag  -0.01976    0.06471  -0.305   0.7603
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9586 on 235 degrees of freedom
Multiple R-squared:  0.02313,	Adjusted R-squared:  0.01482
F-statistic: 2.782 on 2 and 235 DF,  p-value: 0.06393

The statistics of interest are obtained here considering some analysis of variance outputs, where this model is compared with the one without the integrated part, and the drift,

> summary(lm(z.diff~1+z.lag.1+z.diff.lag ))$coefficients[2,3] [1] -2.303948 > anova(lm(z.diff ~ z.lag.1 + 1 + z.diff.lag),lm(z.diff ~ 0 + z.diff.lag))$F[2]
[1] 2.732912

Those two values are the ones obtained also with

> df=ur.df(X,type="drift",lags=1)
> summary(df)

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################

Test regression drift

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
Min       1Q   Median       3Q      Max
-2.91930 -0.56731 -0.00548  0.62932  2.45178

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.29175    0.13153   2.218   0.0275 *
z.lag.1     -0.03559    0.01545  -2.304   0.0221 *
z.diff.lag  -0.01976    0.06471  -0.305   0.7603
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9586 on 235 degrees of freedom
Multiple R-squared:  0.02313,	Adjusted R-squared:  0.01482
F-statistic: 2.782 on 2 and 235 DF,  p-value: 0.06393

Value of test-statistic is: -2.3039 2.7329

Critical values for test statistics:
1pct  5pct 10pct
tau2 -3.46 -2.88 -2.57
phi1  6.52  4.63  3.81

And we can also include a linear trend,

> temps=(lags+1):n
> summary(lm(z.diff~1+temps+z.lag.1+z.diff.lag ))

Call:
lm(formula = z.diff ~ 1 + temps + z.lag.1 + z.diff.lag)

Residuals:
Min       1Q   Median       3Q      Max
-2.87727 -0.58802 -0.00175  0.60359  2.47789

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.3227245  0.1502083   2.149   0.0327 *
temps       -0.0004194  0.0009767  -0.429   0.6680
z.lag.1     -0.0329780  0.0166319  -1.983   0.0486 *
z.diff.lag  -0.0230547  0.0652767  -0.353   0.7243
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9603 on 234 degrees of freedom
Multiple R-squared:  0.0239,	Adjusted R-squared:  0.01139
F-statistic:  1.91 on 3 and 234 DF,  p-value: 0.1287

> summary(lm(z.diff~1+temps+z.lag.1+z.diff.lag ))$coefficients[3,3] [1] -1.98282 > anova(lm(z.diff ~ z.lag.1 + 1 + temps+ z.diff.lag),lm(z.diff ~ 1+ z.diff.lag))$F[2]
[1] 2.737086

while R function returns

> df=ur.df(X,type="trend",lags=1)
> summary(df)

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################

Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)

Residuals:
Min       1Q   Median       3Q      Max
-2.87727 -0.58802 -0.00175  0.60359  2.47789

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.3227245  0.1502083   2.149   0.0327 *
z.lag.1     -0.0329780  0.0166319  -1.983   0.0486 *
tt          -0.0004194  0.0009767  -0.429   0.6680
z.diff.lag  -0.0230547  0.0652767  -0.353   0.7243
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9603 on 234 degrees of freedom
Multiple R-squared:  0.0239,	Adjusted R-squared:  0.01139
F-statistic:  1.91 on 3 and 234 DF,  p-value: 0.1287

Value of test-statistic is: -1.9828 1.8771 2.7371

Critical values for test statistics:
1pct  5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2  6.22  4.75  4.07
phi3  8.43  6.49  5.47
• KPSS test

Here, in the KPSS testing procedure, two models can be considerd : with a drift, or with a linear trend. Here, the null hypothesis is that the series is stationnary.

With a drift, the code is

> summary(ur.kpss(X,type="mu"))

#######################
# KPSS Unit Root Test #
#######################

Test is of type: mu with 4 lags.

Value of test-statistic is: 0.972

Critical value for a significance level of:
10pct  5pct 2.5pct  1pct
critical values 0.347 0.463  0.574 0.73

while it will be, in the case there is a trend

> summary(ur.kpss(X,type="tau"))

#######################
# KPSS Unit Root Test #
#######################

Test is of type: tau with 4 lags.

Value of test-statistic is: 0.5057

Critical value for a significance level of:
10pct  5pct 2.5pct  1pct
critical values 0.119 0.146  0.176 0.216

One more time, it is possible to use another package to get the same test (but again, a different output)

> kpss.test(X,"Level")

KPSS Test for Level Stationarity

data:  X
KPSS Level = 1.1997, Truncation lag parameter = 3, p-value = 0.01

> kpss.test(X,"Trend")

KPSS Test for Trend Stationarity

data:  X
KPSS Trend = 0.6234, Truncation lag parameter = 3, p-value = 0.01

At least, there is some kind of consistency, since we keep rejecting the stationnary assumption, for that series.

• Philipps-Perron test

The Philipps-Perron test is based on the ADF procedure. The code is here

> PP.test(X)

Phillips-Perron Unit Root Test

data:  X
Dickey-Fuller = -2.0116, Truncation lag parameter = 4, p-value = 0.571

with again, a possible alternative with the other package

> pp.test(X)

Phillips-Perron Unit Root Test

data:  X
Dickey-Fuller Z(alpha) = -7.7345, Truncation lag parameter = 4, p-value
= 0.6757
alternative hypothesis: stationary
•  Comparison

I will not spend more time comparing the different codes, in R, to run those tests. Let us spend some additional time on a quick comparison of those three procedure. Let us generate some autoregressive processes, with more or less autocorrelation, as well as some random walk, and let us see how those tests perform :

> n=100
> AR=seq(1,.7,by=-.01)
> P=matrix(NA,3,31)
> M1=matrix(NA,1000,length(AR))
> M2=matrix(NA,1000,length(AR))
> M3=matrix(NA,1000,length(AR))

> for(i in 1:(length(AR)+1)){
+ for(s in 1:1000){
+ if(i==1) X=cumsum(rnorm(n))
+ if(i!=1) X=arima.sim(n=n,list(ar=AR[i]))
+ library(urca)
+ M2[s,i]=as.numeric(pp.test(X)$p.value) + M1[s,i]=as.numeric(kpss.test(X)$p.value)
+ }}

Here, we would like to count how many times the p-value of our tests exceed 5%,

> prop05=function(x) mean(x>.05)
+ P[1,]=1-apply(M1,2,prop05)
+ P[2,]=apply(M2,2,prop05)
+ P[3,]=apply(M3,2,prop05)
+ }
> plot(AR,P[1,],type="l",col="red",ylim=c(0,1),ylab="proportion of non-stationnary
+ series",xlab="autocorrelation coefficient")
> lines(AR,P[2,],type="l",col="blue")
> lines(AR,P[3,],type="l",col="green")
> legend(.7,1,c("ADF","KPSS","PP"),col=c("green","red","blue"),lty=1,lwd=1)

We can see here how poorly Dickey-Fuller test behave, since a 50% (at least) of our autoregressive processes are considered as non-stationnary.