Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post gives a brief introduction to the estimation and forecasting of a Vector Autoregressive Model (VAR) model using R . We use vars and tsDyn R package and compare these two estimated coefficients. We also consider VAR in level and VAR in difference and compare these two forecasts.

# VAR Model

### VAR and VECM model

For a vector times series modeling, a vector autoregressive model (VAR) is used for describing the short-term dynamics. When there are the presence of long-term equilibrium relationships, a vector error correction model (VECM) is used, which consists of a VAR model and error correction equations.

In principle, there are three basic pairs of (model, data) in the context of the vector time series modeling. Given a vector of endogenous variables, $$X_t = (X_{1t}, X_{2t}, …, X_{kt})^{‘}$$,

Firstly, when all variables are stationary in level ($$X_t$$), we use a VAR with stationary level variables ($$X_t$$)
\begin{align} X_t = c + \beta_1 X_{t-1} +\beta_2 X_{t-2} + … + \epsilon_t \end{align}
Secondly, when all variables are non-stationary in level ($$X_t$$), we use a VAR with stationary growth variables ($$\Delta X_t$$)
\begin{align} \Delta X_t = c + \beta_1 \Delta X_{t-1} +\beta_2 \Delta X_{t-2} + … + \epsilon_t \end{align}
Thirdly, when all variables are non-stationary in level ($$X_t$$) and there are some cointegrations in level($$\Pi X_{t}$$), we use a VECM model which consists of a VAR with stationary growth variables ($$\Delta X_t$$) and error correcting equations with non-stationary level variables ($$X_t$$).
\begin{align} \Delta X_t = c + \beta_1 \Delta X_{t-1} +\beta_2 \Delta X_{t-2} + … + \Pi X_{t-1} + \epsilon_t \end{align}
Here, $$\Pi X_{t-1}$$ is the first lag of linear combinations of non-stationary level variables or error correction terms (ECT) which represent long-term relationships among non-stationary level variables.

Since this post deals with VAR model, contents regarding stationarity and cointegration will be discussed in the next post.

For example, a tri-variate VAR(2) with differenced variables can be represented in the following way.

\begin{align} \begin{bmatrix} \Delta x_t \\ \Delta y_t \\ \Delta z_t \end{bmatrix} &=\begin{bmatrix} \gamma_{11}^1 & \gamma_{12}^1 & \gamma_{13}^1 \\ \gamma_{21}^1 & \gamma_{22}^1 & \gamma_{23}^1 \\ \gamma_{31}^1 & \gamma_{32}^1 & \gamma_{33}^1 \end{bmatrix} \begin{bmatrix} \Delta x_{t-1} \\ \Delta y_{t-1} \\ \Delta z_{t-1} \end{bmatrix} \\ &+\begin{bmatrix} \gamma_{11}^2 & \gamma_{12}^2 & \gamma_{13}^2 \\ \gamma_{21}^2 & \gamma_{22}^2 & \gamma_{23}^2 \\ \gamma_{31}^2 & \gamma_{32}^2 & \gamma_{33}^2 \end{bmatrix} \begin{bmatrix} \Delta x_{t-2} \\ \Delta y_{t-2} \\ \Delta z_{t-2} \end{bmatrix} \\ &+\begin{bmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \\ c_{31} & c_{32} \end{bmatrix} \begin{bmatrix} 1 \\ d_{t} \end{bmatrix} +\begin{bmatrix} \epsilon_{xt} \\ \epsilon_{yt} \\ \epsilon_{zt} \end{bmatrix} \end{align}

### Lag Length p

Lag length (p) is selected by using several information criteria : AIC, HQ, SC, and so on. Lower these scores are better since these criteria penalize models that use more parameters. This work can be done easily by using VARselect() function with a maximum lag. From the following results, we set lag lengths of VAR in level and VAR in difference to 2 and 1 respectively.

 1234567891011121314 > VARselect(df.lev, lag.max = 4,+           type = “const”, season = 4) $selectionAIC(n) HQ(n) SC(n) FPE(n) 2 1 1 2$criteria                   1             2             3             4AIC(n) -3.499648e+01 -3.515435e+01 -3.500078e+01 -3.486624e+01HQ(n)  -3.453329e+01 -3.445956e+01 -3.407440e+01 -3.370827e+01SC(n)  -3.378435e+01 -3.333616e+01 -3.257652e+01 -3.183593e+01FPE(n)  6.393815e-16  5.601040e-16  6.876842e-16  8.607516e-16 Colored by Color Scripter cs

 1234567891011121314 > VARselect(df.diff, lag.max = 4,+           type = “const”, season = 4) $selectionAIC(n) HQ(n) SC(n) FPE(n) 1 1 1 1$criteria                   1             2             3             4AIC(n) -3.476879e+01 -3.456481e+01 -3.418449e+01 -3.411838e+01HQ(n)  -3.430280e+01 -3.386583e+01 -3.325251e+01 -3.295340e+01SC(n)  -3.354510e+01 -3.272927e+01 -3.173710e+01 -3.105914e+01FPE(n)  8.033856e-16  1.012233e-15  1.564343e-15  1.839676e-15 Colored by Color Scripter cs

### Data

From urca R package, we can load denmark dataset which is used for estimating a money demand function of Denmark in Johansen and Juselius (1990). This dataset consists of logarithm of real money M2 (LRM), logarithm of real income (LRY), logarithm of price deflator (LPY), bond rate (IBO), bank deposit rate (IDE), which covers the period 1974:Q1 – 1987:Q3.

Visual inspections of this data in level show that real income rises, there is a demand for more money to make the transactions. On the other hand, the negative relationship between money and interest rate is found since the interest one gives up is the opportunity cost of holding money. It seems that there is a cointegration among money, income and interest rates.

In particular, when dealing with money demand function, it is important to include seasonal or monthly dummy variables to capture cyclical variations of money demand due simply to different seasonal and monthly patterns of payments or receipts.

### R code

In principle, we need to check for the stationarity of each variables but in this exercise, we use both models of VAR in level and VAR in difference. It is because of the fact that many researchers usually use VAR model in difference but some researchers prefer to use VAR in level irrespective of the stationarity.

The following R code consists of 1) data loading, 2) VAR in level (vars package), 3) VAR in difference (vars package), and 4) VAR in difference (tsDyn package).

Since our model is a quarterly VAR for M2 demand, centered seasonal dummy variables are included as exogenous variables.

 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161 #========================================================## Quantitative ALM, Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee ## https://kiandlee.blogspot.com#——————————————————–## Vector Autoregressive Model#========================================================# graphics.off()  # clear all graphsrm(list = ls()) # remove all files from your workspace library(urca) # ca.jo, denmarklibrary(vars) # vec2var #========================================================# Data#======================================================== # forecasting horizonnhor <– 12  #——————————————–# quarterly data related with money demand#——————————————–# LRM : logarithm of real money M2 (LRM)# LRY : logarithm of real income (LRY)# LPY : logarithm of price deflator (LPY)# IBO : bond rate (IBO)# IDE : bank deposit rate (IDE)# the period 1974:Q1 – 1987:Q3#——————————————– # selected variablesdata(denmark)df.lev <– denmark[,c(“LRM”,“LRY”,“IBO”,“IDE”)]m.lev  <– as.matrix(df.lev)nr_lev <– nrow(df.lev) # quarterly centered dummy variablesdum_season <– data.frame(yyyymm = denmark$ENTRY)substr.q <– as.numeric(substring(denmark$ENTRY, 6,7))dum_season$Q2 <– (substr.q==2)–1/4dum_season$Q3 <– (substr.q==3)–1/4dum_season$Q4 <– (substr.q==4)–1/4dum_season <– dum_season[,–1] # Draw Graphstr.main <– c( “LRM=ln(real money M2)”, “LRY=ln(real income)”, “IBO=bond rate”, “IDE=bank deposit rate”) x11(width=12, height = 6); par(mfrow=c(2,2), mar=c(5,3,3,3))for(i in 1:4) { matplot(m.lev[,i], axes=FALSE, type=c(“l”), col = c(“blue”), main = str.main[i]) axis(2) # show y axis # show x axis and replace it with # an user defined sting vector axis(1, at=seq_along(1:nrow(df.lev)), labels=denmark$ENTRY, las=2)} #========================================================# VAR model in level#======================================================== # lag lengthVARselect(df.lev, lag.max = 4,          type = “const”, season = 4) # estimationvar.model_lev <– VAR(df.lev, p = 2,                      type = “const”, season = 4) # forecast of lev datavar.pred <– predict(var.model_lev, n.ahead = nhor)x11(); par(mai=rep(0.4, 4)); plot(var.pred)x11(); par(mai=rep(0.4, 4)); fanchart(var.pred)  #========================================================# VAR model in difference using vars#======================================================== # 1st differenced datadf.diff <– diff(as.matrix(df.lev), lag = 1)colnames(df.diff) <– c(“dLRM”, “dLRY”, “dIBO”, “dIDE”)m.diff <– as.matrix(df.diff) # lag lengthVARselect(df.diff, lag.max = 4,          type = “const”, season = 4) # estimationvare_diff <– VAR(df.diff, p = 1,                  type = “const”, season = 4) # forecast of differenced datavarf_diff <– predict(vare_diff, n.ahead = nhor)x11(); par(mai=rep(0.4,4)); plot(varf_diff)x11(); par(mai=rep(0.4,4)); fanchart(varf_diff) # recover lev forecastm.varf_lev_ft <– rbind(m.lev, matrix(NA, nhor, 4))m.ft_df <– do.call(cbind,lapply(varf_diff$fcst, function(x) x[,“fcst”])) # growth to levelfor(h in (nr_lev+1):(nr_lev+nhor)) { hf <– h – nr_lev m.varf_lev_ft[h,] <– m.varf_lev_ft[h–1,] + m.ft_df[hf,]} # Draw Graphx11(width=8, height = 8); par(mfrow=c(4,1), mar=c(2,2,2,2)) for(i in 1:4) { df <– m.varf_lev_ft[,i] matplot(df, type=c(“l”), col = c(“blue”), main = str.main[i]) abline(v=nr_lev, col=“blue”)} #========================================================# VAR model in difference using tsDyn#======================================================== linevare_diff <– lineVar(data = df.lev, lag = 1, include = “const”, model = “VAR”, I = “diff”, beta = NULL, exogen = dum_season) # check if both models (vars & tsDyn) yield same coefficientslinevare_diff do.call(rbind,lapply(vare_diff$varresult,                      function(x) x\$coefficients)) # quarterly centered dummy variables for forecastdumf_season <– rbind(tail(dum_season,4),                     tail(dum_season,4),                     tail(dum_season,4))# forecastlinevarf_diff <– predict(linevare_diff, n.ahead = nhor,                          exoPred = dumf_season) # Draw Graphx11(width=8, height = 8); par(mfrow=c(4,1), mar=c(2,2,2,2)) df <– rbind(df.lev, linevarf_diff) for(i in 1:4) {    matplot(df[,i], type=c(“l”), col = c(“blue”),             main = str.main[i])     abline(v=nr_lev, col=“blue”)} Colored by Color Scripter cs

In VAR in level, we can get the following forecasts by calling VAR() and predict() function.

In VAR in difference using vars R package, we can get the following forecasts by calling VAR() and predict() function.

In particular, VAR() model considers seasonal dummy variables automatically so that we don’t need to set seasonal dummy variables manually but have only to set season = 4

However it does not produce level forecasts automatically when we use differenced data so that we need to convert the forecasts of differenced variables to the forecasts of level variables manually.

In VAR in difference using tsDyn R package, we can get the following forecasts by calling lineVar() and predict() function.

In particular, lineVar() model uses level variables and performs not VAR in level but VAR in difference by setting I = “diff”. It is convenient for us to get the forecasts of level variables not of differenced variables. There is no need for the conversion from differenced variables forecasts to level variables forecasts.

But unlike VAR() function, lineVar() function does not consider seasonal dummy variables automatically so that we need to set seasonal dummy variables manually by setting exogen = dum_season.

In the above R code, VAR() model and lineVar() model deal with the same representation. Therefore we can compare the estimated coefficients of these two models. From the output below, we can easily find that two estimated coefficients are the same.

In summary, we can figure out the differences of two forecasts. In other words, while VAR in level generated somewhat flattened forecasts, VAR in difference generated a little trended forecasts. These forecasts result from the assumption of underlying variables.

### Concluding Remarks

In this post, we’ve implemented R code for VAR estimation and forecasting. This is a reduced form model which is used for forecasting. Since our focus is on the level forecasts, we perform three works: 1) estimation and forecast of a VAR in level, 2) estimation of a VAR in difference and forecasts of level variables by using forecast of differenced variables, 3) estimation of a VAR in difference and forecasts of level variables directly. Next posts will cover the VECM model.

### Reference

Johansen, S. and Juselius, K. (1990), Maximum Likelihood Estimation and Inference on Cointegration – with Applications to the Demand for Money, Oxford Bulletin of Economics and Statistics, 52, 2, 169–210. $$\blacksquare$$