**BMS Add-ons » BMS Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This tutorial illustrates how to use Bayesian Model Averaging (BMA) with panel data using the `R`

package BMS.

## Contents

- Introduction
- Fixed Effects Estimation by Demeaning the Data
- Fixed Effects Estimation with Dummy Variables
- Bibliography
- Downloads

A pdf version of this tutorial is available here.

## Introduction

Methods for estimating econometric models with panel data have been frequently discussed in the literature (see eg Mundlak, 1978).

Two estimators seem to resurface most often: the fixed effects estimator (FE) and the random effects estimator (RE). Both have their separate virtues and underlying assumptions (for an exposition see

Bartels, 2008). Since the FE estimator can be easily cast into the linear regression framework that is used for the BMA package `BMS`

, it will be our focus in this tutorial. For an application of Bayesian Model

Averaging employing the RE estimator please refer to Moral-Benito (2011). Furthermore, a great deal of the literature seems to pivot around the question of how to calculate standard errors (Bartels, 2008). At the current stage, we abstract from the calculation of so-called *clustered * standard errors since we stay in a pure Bayesian framework in `BMS`

(and standard errors are a classical concept).

For the purpose of illustration we will use the data put forward in

Moral-Benito (2011). The data contains `K=35`

variables (including the dependent variable, the growth rate of per capita GDP) for `N=73`

countries and for the period 1960-2000. The appendix lists the variables

together with a short description. The dependent variable, GDP growth, is calculated for five year averages resulting into eight observations per country. Moral-Benito (2011) argues in favor of calculating averages of

flow variables, while stock variables have been measured at the first year of each five-year period. The data can be downloaded at Moralbenito.com (‘download data for the paper

Determinants of Economic Growth: A Bayesian Panel Data Approach and unzip).

library(BMS)

After having started `R`

and loaded the `BMS`

package we can read in the data file:

data.raw=read.table("Dataset.csv",header=TRUE,sep=";") panelDat=data.raw[,-c(1:4)] rownames(panelDat)=paste(data.raw[,"code"],data.raw[,"year"],sep="_") panelDat=panelDat[,-charmatch(c("GRW_WB","GDP_WB"),colnames(panelDat))]

For that purpose I have simply converted the ‘Dataset.xls’ file to a ‘Dataset.csv’ file, replaced all missing values by ‘NA’ in excel and put it online. You can also therefore directly download the data with the command:

load(url("http://bms.zeugner.eu/blog/images/2011-05/paneldat.rda"))

Let us now peek into the data with the following commands:

panelDat=as.matrix(panelDat) head(panelDat)

The rownames of the data are a combination of the country code and the year of the observation. The data is provided in a `data.frame`

consisting of stacked observations per column.

That is, the first column containing the dependent variable consists

*Y _{1} = (y_{1,1},y_{1,T=8}…,y_{N,1},…,y_{N,T})*

This is also the format we can use

later on when calling the `bms`

function.

We will use two approaches to estimate a fixed effects (FE) panel (country / time fixed effects). The first approach makes use of the Frisch-Waugh-Lovell theorem (see eg Lovell, 2008), boiling down to demeaning the data accordingly. That is, in the case of country fixed effects, subtract from each observation (dependent and independent variable) the within country mean.

For the case of time fixed effects, subtract from each observation the mean across countries per time period. We will start with the country

fixed effects first.

For that purpose we will have to re-shape the data frame and put it into the form of a three dimensional array (`T x K x N`

). That can be achieved with the function `panel_unstack`

. Since

`bms`

uses data in its stacked form, we have to make use of `panel_stack`

as well. Both functions are not part of the `BMS`

library and thus have to be copied and pasted into your `R`

console by yourself:

panel_unstack= function(stackeddata, tstep=NULL) { # stackeddata is a stacked data frame/matrix ordered in the following way # Variable1 Variable2 # ctry1_time1 # ctry1_time2 # ... # ctry2_time1 # ctry2_time2 # ... # tstep is the number of time points # panel_unstack produces a 3-dimensional array out of that bigT=nrow(stackeddata);K=ncol(stackeddata); if (is.null(tstep)) tstep=bigT X1=aperm(array(as.vector(t(as.matrix(stackeddata))),dim=c(K,tstep,bigT/tstep)), perm=c(2,1,3)) try(dimnames(X1)[[1]] <- unique(sapply(strsplit(rownames(stackeddata),"_"), function(x) x[[2]])), silent=TRUE) try(dimnames(X1)[[2]] <- colnames(stackeddata), silent=TRUE) try(dimnames(X1)[[3]] <- unique(sapply(strsplit(rownames(stackeddata),"_"), function(x) x[[1]])), silent=TRUE) return(X1) } panel_stack = function(array3d) { x1= apply(array3d,2,rbind) try(rownames(x1) <- as.vector(sapply(dimnames(array3d)[[3]], FUN=function(x) paste(x, dimnames(array3d)[[1]], sep="_"))), silent=TRUE) return(as.data.frame(x1)) }

We can now easily transform the data from its stacked form into the three-dimensional array via:

dat.array=panel_unstack(panelDat, tstep=8)

where we have set `tstep=8`

since we have eight time periods per country. The advantages of the three-dimensional array are that we can easily access each dimension of the data:

dat.array[,,"ZWE"] dat.array["1965",,] dat.array[,"GSH",]

## Fixed Effects Estimation by Demeaning the Data

The function `demean`

(again not part of the `BMS`

library, so copy and paste the following lines into your `R`

console) demeans the data to estimate individual (eg country), time and

individual and time fixed effects. It takes as argument the three dimensional data array we have created above (`dat.array`

) and via `margin`

we can specify over which dimension we want to

demean the data (country / time).

demean = function(x, margin) { #x is an array #margin is the dimension along which should be demeaned if (!is.array(x)) stop("x must be an array/matrix") otherdims=(1:length(dim(x)))[-margin] sweep(x,otherdims,apply(x,otherdims,mean)) }

Demeaning is now easily accomplished by:

timeDat=panel_stack(demean(dat.array,3)) countryDat=panel_stack(demean(dat.array,1))

where we have used `panel_stack`

to re-transform the demeaned data into its stacked form that can be passed to the `bms`

function.

Since in the data frame only the first 12 explanatory variable show variation over time, we will restrict estimation to these variables only.

# only the first 12 variables show time variation, so estimate panel with those regressors only modelCd=bms(countryDat[,1:13],user.int=F) modelTd=bms(timeDat[,1:13],user.int=F)

We will briefly discuss the results in the next section (see Table 1 and Table 2).

Note that demeaning the data yields the same posterior estimates for the coefficients as with incorporating the FE directly, the approach we opt for in the next section. However,

the posterior variance for the coefficient estimates is not identical (though very similar). Also note that demeaning does not save you from the degrees of freedom problem when incorporating

the large set of fixed effects by a set of dummy variables. For an application using BMA with country FEs see for example Crespo Cuaresma et al. (2009).

## Fixed Effects Estimation with Dummy Variables

We will now turn to the second possibility of estimating FEs, which is the dummy variable approach. The advantage of the dummy variable approach is also that it yields estimates for

the FEs which can be important for some applications. For the dummy approach we will make use of the new `bms`

feature of holding variables constant (not sampling) them by the `bms`

argument `fixed.reg`

. Please make sure that you have installed a `BMS`

version >= 0.3. We start now with creating the country dummies:

# country dummies bigT=nrow(panelDat);tstep=8; countryDummies=kronecker(diag(bigT/tstep),rep(1,tstep)); colnames(countryDummies)=dimnames(dat.array)[[3]] # we have to leave one out (first column in this example, but of course is not relevant which column we leave out) countryDummies=countryDummies[,-1]

In a same fashion we can easily create a set of time dummies:

# time series dummies timeDummies=matrix(diag(tstep),bigT,tstep,byrow=T); colnames(timeDummies)=dimnames(dat.array)[[1]] # we have to leave one out (first column in this example, but of course is not relevant which column we leave out) timeDummies=timeDummies[,-1] modelTdummy=bms(cbind(panelDat[,1:13],timeDummies),fixed.reg=colnames(timeDummies),user.int=F)

Running the two regressions (for the first 13 elements of the demeaned data frame only):

modelCdummy=bms(cbind(panelDat[,1:13],countryDummies),fixed.reg=colnames(countryDummies),user.int=F) modelTdummy=bms(cbind(panelDat[,1:13],timeDummies),fixed.reg=colnames(timeDummies),user.int=F)

should yield the same results as with demeaning. Type `coef(modelCd)`

and `coef(modelCdummy)`

to get the results in `R`

. These are summarized in the Table below:

PIP Post Mean Post SD | PIP Post Mean Post SD GDP_PWT 1.000 -0.244 0.025 |1.000 -0.244 0.025 POP 1.000 0.000 0.000 |1.000 0.000 0.000 PGRW 0.286 -0.534 0.994 |0.286 -0.534 0.994 IPR 0.327 0.000 0.000 |0.327 0.000 0.000 OPEM 0.999 0.156 0.032 |0.999 0.156 0.032 CSH 0.999 -0.296 0.066 |0.999 -0.296 0.066 GSH 0.981 -0.458 0.139 |0.980 -0.458 0.139 ISH 0.523 0.133 0.148 |0.522 0.133 0.148 LBF 0.510 0.282 0.321 |0.509 0.281 0.321 LEX 0.129 0.000 0.001 |0.129 0.000 0.001 PDE 0.070 0.000 0.000 |0.070 0.000 0.000 URBP 0.083 -0.006 0.043 |0.083 -0.006 0.043 Table 1: Estimation of country fixed effects: Left panel based on demeaning the data, right panel on the dummy variable estimation approach

As one can see the results are very similar to each other. Since we have used 'full enumeration' no stochastic variability should be expected for the two approaches.

However, when using large data sets and thus the MCMC sampler in turn, please bear in mind that there might be some stochastic variation of results when running differen `bms`

chains.

Posterior coefficients for the model employing country fixed effects are to be interpreted with respect to the within variation: A positive

coefficient on the variable measuring the country's openness (`OPEM`

) means that if openness increases *within* a country GDP growth is incraesing. On the other hand, time fixed effects in

the particular example look at the between variation of the data. That is, if openness *across countries* (at once) increases, does this affect GDP growth? From Table 2 (equivalent to `coef(modelTd); coef(modelTdummy)`

we see that this effect is smaller compared to that for the within transformed data.

PIP Post Mean Post SD PIP Post Mean Post SD GDP_PWT 1.000 -0.071 0.013 1.000 -0.071 0.013 POP 0.433 0.000 0.000 0.432 0.000 0.000 PGRW 0.994 -2.542 0.647 0.994 -2.542 0.647 IPR 0.771 0.000 0.000 0.770 0.000 0.000 OPEM 0.376 0.013 0.019 0.376 0.013 0.019 CSH 0.095 -0.004 0.016 0.095 -0.004 0.016 GSH 0.136 -0.012 0.039 0.136 -0.012 0.039 ISH 0.864 0.212 0.113 0.864 0.212 0.113 LBF 0.060 0.001 0.026 0.060 0.001 0.026 LEX 1.000 0.007 0.001 1.000 0.007 0.001 PDE 0.347 0.000 0.000 0.347 0.000 0.000 URBP 0.103 -0.005 0.021 0.102 -0.005 0.021 Table 2: Estimation of time fixed effects: Left panel based on demeaning the data, right panel on the dummy variable estimation approach

### Bibliography

- Bartels, Brandon (2008). Beyond 'Fixed versus Random Effects': A Framework for

Improving Substantive and Statistical Analysis of Panel, Time-Series Cross-Sectional,

and Multilevel Data. Mimeo, Stony Brook University, New York. - Crespo Cuaresma, J. and Doppelhofer, G. and Feldkircher, M. 2009: The Determinants of Economic Growth in European Regions. CESifo Working Paper Series, No. 2519.
- Lovell, M., 2008. A Simple Proof of the FWL (Frisch,Waugh,Lovell) Theorem. Journal

of Economic Education. - Moral Benito, Enrique (2011). Determinants of Economic Growth: A Bayesian Panel Data Approach. The Review of Economics and Statistics, forthcoming (working paper version)
- Mundlak, Yair, 1978. On the Pooling of Time Series and Cross Section Data. Econometrica, Vol. 46, p. 69-85.

### Downloads

**leave a comment**for the author, please follow the link and comment on their blog:

**BMS Add-ons » BMS Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.