# Bayesian Model Averaging (BMA) with uncertain Spatial Effects

**BMS Add-ons » BMS Add-ons**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This file illustrates the computer code to use spatial filtering in the context of Bayesian Model Averaging (BMA). For more details and in case you use the code please cite Crespo Cuaresma and Feldkircher (2010).

In addition, this tutorial exists as well in PDF form: web_tutorial_spatfilt.pdf

### Installing spatBMS

To get the code started you need to install the R-packages `spdep`

and `BMS`

(Version ≥ 0.2.5). Then install the add-on package `spatBMS`

by downloading one of the following binaries according to your system:

- Mac OS / Linux: spatBMS_0.0.tar.gz
- Windows with R Version 2.10 or greater: spatBMS.zip
- Windows with R Version 2.5 to 2.9: spatBMS.zip

For further information on installing spatBMS manually, consult the corresponding page on installing BMS.

The following text has been tested with `R.2.11`

.

### A Primer to Spatial Filtering and BMA

Consider a cross-sectional regression of the following form:

*y = α ι*+

_{N}*ρ*

*W*y +

*X*+

_{k}χ_{k}*σ*

*ε*

where *y* is an *N*-dimensional column vector of the dependent variable, *α* is the intercept term, *ι _{N}* is an

*N*-dimensional

column vector of ones,

*X*is a matrix whose columns are stacked data for

_{k}= (x_{1},…,x_{k})*k*

explanatory variables and

*χ*is the

_{k}= (χ_{1},…,χ_{k})’*k*-dimensional parameter vector corresponding to

the variables in

*X*. The spatial autocorrelation structure is specified via a spatial weight matrix

_{k}*W*. The coefficient

*ρ*attached to

*W*reflects the degree of spatial autocorrelation.

Equation (1) constitutes a parametric spatial model where the spatial parameter *ρ* is often interpreted as a spillover parameter.

In this setting, on top of the uncertainty regarding the choice of explanatory an extra degree of uncertainty arises:

we do *not* know the actual nature of the spatial interactions which we model through the spatial autoregressive term in equation (1),

that is, if we conduct inference conditional on *W*. Spatial autocorrelation will be observable whenever the phenomenon under study is

a spatial process or omitted variables cause spatial variation in the residuals (Tiefelsdorf and Griffith, 2007). Note that both

arguments typically apply to economic cross-section data, where economic units interact with each other

and omitted variables decrease the level of confidence in econometric analysis. Since inference from the SAR model is conditional on the weight matrix *W*, which has to be exogenously specified, explicitly accounting for this source of model uncertainty is a natural generalization to uncertainty in the nature of *X _{k}* in the framework of BMA. In most applications there is little theoretical guidance on which structure to put on the weight matrix rendering its specification a serious challenge.

#### Spatial Filtering

The spatial filtering literature seeks to remove residual spatial autocorrelation patterns prior to estimation and is in principle

not interested in directly estimating *ρ* in equation (1). The approach put forward by Getis and Griffith (2002) and

Tiefelsdorf and Griffith (2007), is based on an eigenvector decomposition of a transformed *W* matrix, where the transformation

depends on the underlying spatial model. The eigenvectors *e _{i}* are included as additional explanatory variables and the regression equation

(1) becomes:

*y = αι*+

_{N}*∑*+

_{(i=1)}^{E}η_{i}e_{i}*X*+

_{k}χ_{k}*σ*

*ε*

where each eigenvector *e _{i}* spans one of the spatial dimensions. By introducing the eigenvectors into the regression,

we explicitly take care of (remaining) spatial patterns in the residuals. Furthermore spatial commonalities among

the covariates in

*X*are conditioned out. This reduces the degree of multicollinearity and further separates spatial effects

_{k}from the ‘intrinsic’ impact the employed regressors exert on the dependent variable.

#### BMA with uncertain spatial effects

From a Bayesian perspective, the problem of obtaining estimates of the parameter associated with a covariate under uncertainty in both the nature of *W*

and *X _{k}*

can be handled in a straightforward manner using spatial filtering techniques. Let us assume that we are interested in a particular regression coefficient,

*β*.

Denote the set of potential models by

*M={M*

^{1}_{1}, M^{1}_{2},…, M^{1}_{2^K}, … M^{2}_{1},*…, M*,

^{2}_{2^K},…, M^{Z}_{1},…, M^{Z}_{2^K}}where

*K*stands for the number of potential explanatory variables and

*Z*the number of candidate spatial weighting matrices

*W*,

_{z}*z=1,…, Z*each with associated set of eigenvectors

*E*. The cardinality of

_{z}*M*is therefore

*2*.

^{K*Z}A particular model, say

*M*, is characterized by its parameter vector

^{z}_{k}*θ*

^{z}_{k}=(α, χ_{k}, η_{z})corresponding to the intercept term included in all models, the coefficients on the regressors entering the model and the coefficients on the set of

eigenvectors

*E*related to

_{z}*W*.

_{z}In the BMA framework, the posterior distribution of

*β*takes now the form of

*p(β|y)*=

*∑*

_{(2^K)}^{(j=1)}∑_{(z=1)}^{Z}p(β|M^{z}_{j},y)p(M^{z}_{j}|y)with *y* denoting the data and *β* the coefficient of interest. Inference on *β* is based on single inferences under models *j=1,…,2 ^{(Z*K)}* weighted by their respective posterior model probabilities,

*p(M*, which in turn depend on the corresponding matrix of spatial weights. For more technical details please see Crespo Cuaresma and Feldkircher (2010).

^{z}_{j}| y)### An Illustration: The Boston Housing Data

In `R`

we will do spatial filtering with the ‘Boston housing data’ which has been originally published in Harrison and Rubinfeld (1978). The dependent variable (`CMEDV`

) contains the

corrected median value of owner-occupied homes in USD 1000’s for 506 observations. Among the explanatory variables we have per capita crime (`CRIM`

), the pupil-teacher ratio by town

(`PTRATIO`

), the proportion of owner-occupied units built prior to 1940 (`AGE`

), the proportion of non-retail business acres per town (`INDUS`

), a variable that is proportional to

the share of Afro Americans per town (`B`

) and a variable reflecting the nitric oxides concentration (`NOX`

) among others. For more details please see `?dataBoston`

. We start

with loading the data in `R`

:

library(spatBMS) data(dataBoston) data(boston.soi)

As in the earlier analyses of these data, we take logarithms of the variables `CMEDV`

, `DIS`

, `RAD`

and `LSTAT`

and squares of the regressors `RM`

(`RM#RM`

)

and `NOX`

(`NOX#NOX`

) to model potential non-linearities. These transformations have been already carried out and the transformed variables stored in `dataBoston`

We proceed with the construction of several weight matrices. To keep the example simple, we limit the space of candidate *W* matrices to five:

- The matrix boston.soi already comes along with the boston data set (
*W*)._{0} - A perturbation of the boston.soi matrix (
*W*)._{1} - A second perturbation of the boston.soi matrix (
*W*)._{2} - A 4 nearest neighbor matrix (
*K*)._{NN4} - A 6 nearest neighbor matrix (
*K*)._{NN6}

The first matrix is included in the `spdep`

package and constitutes probably a first order contiguity matrix. This class of matrices assigns positive (identical) weights to observations

that share a common border. See Anselin (1988) and Crespo Cuaresma and Feldkircher (2010) for an interpretation of different coding schemes.

For the sake of illustration we will randomly perturb the matrix using the function `jitterW_binary`

which is provided in the corresponding `Sweave`

file to this pdf.

The perturbation randomly adds / drops `N=5`

neighborhood relationships of the `boston.soi`

weight matrix.

The function takes as argument a `matrix`

object, thus we have to transform `boston.soi`

with the function `nb2mat`

into a matrix.

W0 <- boston.soi bostMat = nb2mat(boston.soi, style = "B") W1 = jitterW_binary(bostMat, N = 5) W1 = mat2listw(W1$Wmat) W1 = W1$neighbours W2 = jitterW_binary(bostMat, N = 5) W2 = mat2listw(W2$Wmat) W2 = W2$neighbours

It is necessary to re-transform the perturbed matrices into the `nb`

class since the `SpatialFiltering`

function we will use later on

only allows for objects of this class as inputs. This can be done with

the function `mat2listw`

and extracting the neighborhood object by typing `object$neighbours`

(in the example `W2$neighbours`

).

Finally we set up two k-nearest neighbor matrices.

coords = coordinates(boston.utm) col.knn4 = knearneigh(coords, k = 4) col.knn4 = knn2nb(col.knn4) coords = coordinates(boston.utm) col.knn6 = knearneigh(coords, k = 6) col.knn6 = knn2nb(col.knn6)

As stated above we assume the model follows a SAR process. To free the residuals from spatial correlation we now filter the dependent variable. The `SpatialFiltering`

function provided in the package `spdep`

will extract the eigenvectors. A linear combination of these eigenvectors will allow us to separate spatial correlation from the dependent variable by using the identified eigenvectors as additional regressors in our econometric model.

Depending on the size of your *W* matrix, spatial filtering can take some while. The function takes the neighborhood objects we have defined above and the data to be filtered as main arguments.

For more details see `?SpatialFiltering`

. Note also that we have set `ExactEV`

to `FALSE`

(quicker) which provides an approximation

for our illustration example.

y = as.data.frame(dataM[, 1, drop = F]) yFilt.colGal0 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W0, style = "W", ExactEV = FALSE) yFilt.colGal1 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W1, style = "W", ExactEV = FALSE) yFilt.colGal2 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W1, style = "W", ExactEV = FALSE) yFilt.knn4 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = col.knn4, style = "W", ExactEV = FALSE) yFilt.knn6 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = col.knn6, style = "W", ExactEV = FALSE)

Finally we collect the eigenvectors in a list.

WL.boston = list(Col0 = fitted(yFilt.colGal0), Col1 = fitted(yFilt.colGal1), Col2 = fitted(yFilt.colGal1), knn4 = fitted(yFilt.knn4), knn6 = fitted(yFilt.knn6))

Note that you can also access the extracted eigenvectors by typing e.g. `yFilt.colGal0$dataset`

instead of

e.g. `fitted(yFilt.colGal0)`

.

### Applying BMA

Now we can start with the BMA part. All functions and input arguments of the `BMS`

library are applicable.

There are two important exceptions though: First, the empirical Bayes estimation as well as the hyper-g priors are so far not implemented. Thus you can specify the g-prior either by using the benchmark priors (Fernandez et al. 2001) or by any numerical number *g>0*. See Feldkircher and Zeugner (2009) for more details on the influence of *g* on posterior results. Secondly, the enumeration algorithm (`mcmc=enum`

) is in its current version not available for `spatFilt.bms`

. Finally, to speed up

calculations `BMS`

provides the option `force.full.ols=FALSE`

, which is not available in `spatFilt.bms`

.

The additional argument `WList`

of the function `spatFilt.bms`

must be a list object with length corresponding to the number of weight matrices you use

`length(WList)`

must be greater than 1, that is you have to submit at least two sets of eigenvectors in order to use spatial filtering in the context of BMA.

Each element of the list contains a matrix with the extracted eigenvectors, where the matrices do not have to have the same column dimension. In the example

we have collected the eigenvectors in the object `WL.boston`

. To have a quick look at the boston data set we run a short BMA chain with 1 million posterior draws

(`iter=1e06`

) after discarding the first 100,000 draws (`burn=1e05`

). For more information regarding the other function arguments type `?spatFilt.bms`

.

dataM = as.matrix(apply(dataM, 2, as.numeric)) model1 = spatFilt.bms(X.data = dataM, WList = WL.boston, burn = 1e+05, iter = 1e+06, nmodel = 100, mcmc = "bd", g = "bric", mprior = "random", mprior.size =(ncol(dataM)-1)/2)

The object `model1`

is a standard `BMS`

object. It is important, to note that *aall* the reported statistics (Posterior Inclusion Probabilities, Posterior Means, Posterior

Standard deviations, etc.) are *after* having integrated out uncertainty with respect to *W*. To fix ideas, we will look at the disaggregated results to - for example - assess whether avariable receives only posterior support under a particular weight matrix or to look at theposterior inclusion probabilites of the spatial weight matrices, first:

model1$wTopModels[, 1:5] 08beb0 08b6b0 08feb0 09beb0 08bef0 CRIM 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 ZN 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 INDUS 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 CHAS 0.0000000 0.0000000 0.00000000 1.00000000 0.00000000 AGE 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 DIS 0.0000000 0.0000000 1.00000000 0.00000000 0.00000000 RAD 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 TAX 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 PTRATIO 1.0000000 0.0000000 1.00000000 1.00000000 1.00000000 B 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 LSTAT 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 NOX 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 RM 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 NOX#NOX 0.0000000 0.0000000 0.00000000 0.00000000 1.00000000 RM#RM 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 W-Index 1.0000000 1.0000000 1.00000000 1.00000000 1.00000000 PMP (Exact) 0.4700282 0.1026632 0.04539982 0.04329733 0.03871103 PMP (MCMC) 0.4676800 0.1037210 0.04752609 0.04262372 0.03858353

In the example we show posterior results for the first 5 models. The line W-Index tells you which *W* has been included in the particular model. In our example

the W-Index indicates that the first weight matrix in `WL.boston`

(i.e. *W _{0}*='boston.soi') has been used in the first 5 regression models. On the contrary:

topmodels.bma(model1)[, 1:3] 45f5 45b5 47f5 CRIM 1.000000 1.0000000 1.00000000 ZN 0.000000 0.0000000 0.00000000 INDUS 0.000000 0.0000000 0.00000000 CHAS 0.000000 0.0000000 0.00000000 AGE 1.000000 1.0000000 1.00000000 DIS 0.000000 0.0000000 1.00000000 RAD 1.000000 1.0000000 1.00000000 TAX 1.000000 1.0000000 1.00000000 PTRATIO 1.000000 0.0000000 1.00000000 B 1.000000 1.0000000 1.00000000 LSTAT 1.000000 1.0000000 1.00000000 NOX 0.000000 0.0000000 0.00000000 RM 1.000000 1.0000000 1.00000000 NOX#NOX 0.000000 0.0000000 0.00000000 RM#RM 1.000000 1.0000000 1.00000000 PMP (Exact) 0.465327 0.1056594 0.04672478 PMP (MCMC) 0.463265 0.0999770 0.04399500

shows the aggregated results: a matrix of containing the best models along with the according posterior model probabilities (exact and frequencies) after integrating out uncertainty with respect to the weight matrices. That is, if the first two models would be the same in terms of explanatory variables but differ regarding the employed weight matrix, the posterior mass of the two models is aggregated.

estimates.bma(model1) PIP Post Mean Post SD Cond.Pos.Sign Idx B 1.000000 5.662607e-04 9.677364e-05 1.00000000 10 LSTAT 1.000000 -1.794610e-01 1.951711e-02 0.00000000 11 CRIM 0.999997 -4.484776e-03 8.450216e-04 0.00000000 1 RM#RM 0.999995 3.755742e-02 6.807775e-03 1.00000000 15 RM 0.997460 -3.609063e-01 8.758833e-02 0.00000501 13 AGE 0.990832 -1.486823e-03 4.082490e-04 0.00000000 5 RAD 0.973028 6.094649e-02 1.926394e-02 1.00000000 7 TAX 0.943864 -2.723645e-04 1.056064e-04 0.00000000 8 PTRATIO 0.832582 -1.114472e-02 6.486644e-03 0.00000000 9 DIS 0.107883 -4.118105e-03 1.944304e-02 0.00549670 6 CHAS 0.089263 -1.408996e-03 8.414781e-03 0.00000000 4 NOX#NOX 0.085315 -8.218984e-03 9.314296e-02 0.02434507 14 INDUS 0.081432 -4.726262e-05 6.037751e-04 0.06234650 3 NOX 0.081259 2.521705e-03 1.183834e-01 0.23021450 12 ZN 0.078215 4.135538e-06 1.114933e-04 0.82154318 2

In the same vein, the posterior inclusion probability (PIP) for the variable `RM`

for example corresponds to the sum of the posterior model probabilities of all regression models including that variable and the posterior mean is calculated as the weighted (by posterior model probabilities) average of posterior means over all weight matrices. Also note that the prior over the *W* space is uniform.

Coming back to the disaggregated (*W*-specific) results. To calculate the posterior inclusion

probabilities of the *W* matrices, you can look a the frequency with which each *W* matrix

has been visited by the sampler and express this in percentages:

model1$Wcount Col0 Col1 Col2 knn4 knn6 992344 3773 3883 0 0 model1$Wcount/sum(model1$Wcount) * 100 Col0 Col1 Col2 knn4 knn6 99.2344 0.3773 0.3883 0.0000 0.0000

In the example, the original 'boston.soi' *W* matrix along with its perturbations receives

overwhelming posterior support, whereas the k-nn matrices cannot explain the spatial patterns present in the data. If you prefer statistics based on the best

models (in the example we have set `nmodel=100`

meaning that results are based on a maximum of 100 models receiving highest posterior support in terms of posterior

model probabilities), you can use the function `pmpW.bma`

:

pmpW.bma(model1) PMP (Exact) PMP (MCMC) Col0 99.5443044 99.5790237 Col1 0.2278478 0.2064531 Col2 0.2278478 0.2145232 knn4 0.0000000 0.0000000 knn6 0.0000000 0.0000000

Usually `model1$Wcount`

gives you a reasonable approximation and is directly accessible from the `model1`

object.

We finally want to check whether there is remaining spatial autocorrelation present in the residuals. For that purpose we can use the 'Moran's I' test calling

`lm.morantest`

for the best models. Although - in order to save time - we could have a look at the 10 best models only (in the example the first 10 models

already account for more than 80% of posterior mass) we carry out the residual test for all models in order to get more accurate results.

The function `mTest`

is wrapper for the function `lm.morantest`

provided in the package `spdep`

. The loop re-computes the `nmodel`

best -

in terms of posterior model probabilities - regressions in order to get an object of class `lm`

and finally applies the `lm.morantest`

.

We do this once for the eigenvector augmented regressions (the spatial filtering approach) and once in a pure OLS fashion by setting the

option `variants="double"`

:

mTest = moranTest.bma(object = model1, variants = "double", W = nb2listw(boston.soi), nmodel = 100)

This allows us for a direct comparison of a non-spatial regression approach and the spatial filtering BMA approach pursued in this article.

If the non-spatial linear regression models do not show any patterns of spatial residual autocorrelation a standard BMA regression

- as with the function `bms`

- dealing solely with uncertainty with respect to the explanatory variables might be preferable.

We now extract the corresponding p-values (remember the null hypothesis corresponds to no spatial autocorrelation):

pvalueMat = cbind(sapply(mTest$moran, function(x) x$p.value), sapply(mTest$moranEV, function(x) x$p.value))

Figure 1 shows the distribution of the p-values of the 'Moran's I' test, once for pure OLS regressions (without any spatial correction, left panel) and once augmented with the eigenvectors identified

by the spatial filtering algorithm (right panel).

To sum up by incorporating the eigenvectors we successfully removed spatial residual autocorrelation from the regressions. The BMA framework helped us to explicitely deal

with uncertainty stemming from the construction of the weight matrices and allowed us to get posterior inclusion probabilities of weighting matrices as well as the usual

BMA statistics dealing with spatial correlation.

Finally, once again please note that all functions of the BMA package `BMS`

are fully applicable to its spatial filtering

variant. The remainder of this illustration shows posterior plots to assess convergence of the MCMC sampler, the posterior distribution of coefficients of interest,

an image plot as well as a plot of the posterior model size. You can also type

`spatBMS.demo`

to get a short demonstration of `spatFilt.bms`

's main functions.

### Bibliography

- Anselin, Luc, 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers.
- Crespo Cuaresma, Jesús and Feldkircher, Martin. 2010. Spatial Filtering, Model Uncertainty and the Speed of Income Convergence in Europe. Working Papers 160, Oesterreichische Nationalbank.
- Feldkircher, Martin and Zeugner, Stefan. 2009. Benchmark Priors Revisited: On Adaptive Shrinkage and the Supermodel Effect in Bayesian Model Averaging.IMF Working Paper, WP/09/202.
- Fernández, Carmen and Ley, Eduardo and Steel, Mark F.J.. 2001. Benchmark Priors for Bayesian Model Averaging Journal of Econometrics, 100, 381-427.
- Getis, Arthur and Griffth, Daniel A. 2002. Comparative Spatial Filtering in Regression Analysis. Geographical Analysis, 34:2, 130-140.
- Harrison, D. and Rubinfeld, D.L. 1978. Hedonic prices and the demand for clean air. Journal of Environmental Economics & Management, Vol.5, 81-102.
- Tiefelsdorf, Michael and Griffith, Daniel 2007. Semiparametric Filtering of spatial auto-correlation: the eigenvector approach. Environment and Planning A, 37, 1193-1221.

### Downloads

**leave a comment**for the author, please follow the link and comment on their blog:

**BMS Add-ons » BMS Add-ons**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.