Bayesian Model Averaging (BMA) with uncertain Spatial Effects

[This article was first published on BMS Add-ons » BMS Add-ons, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This file illustrates the computer code to use spatial filtering in the context of Bayesian Model Averaging (BMA). For more details and in case you use the code please cite Crespo Cuaresma and Feldkircher (2010).
In addition, this tutorial exists as well in PDF form: web_tutorial_spatfilt.pdf

Installing spatBMS

To get the code started you need to install the R-packages spdep and BMS (Version ≥ 0.2.5). Then install the add-on package spatBMS by downloading one of the following binaries according to your system:

For further information on installing spatBMS manually, consult the corresponding page on installing BMS.
The following text has been tested with R.2.11.

A Primer to Spatial Filtering and BMA

Consider a cross-sectional regression of the following form:

Equation 1: y = α ιN + ρWy + Xkχk + σε

where y is an N-dimensional column vector of the dependent variable, α is the intercept term, ιN is an N-dimensional column vector of ones, Xk = (x1,…,xk) is a matrix whose columns are stacked data for k explanatory variables and χk = (χ1,…,χk)’ is the k-dimensional parameter vector corresponding to the variables in Xk. The spatial autocorrelation structure is specified via a spatial weight matrix W. The coefficient ρ attached to W reflects the degree of spatial autocorrelation. Equation (1) constitutes a parametric spatial model where the spatial parameter ρ is often interpreted as a spillover parameter.

In this setting, on top of the uncertainty regarding the choice of explanatory an extra degree of uncertainty arises: we do not know the actual nature of the spatial interactions which we model through the spatial autoregressive term in equation (1), that is, if we conduct inference conditional on W. Spatial autocorrelation will be observable whenever the phenomenon under study is a spatial process or omitted variables cause spatial variation in the residuals (Tiefelsdorf and Griffith, 2007). Note that both arguments typically apply to economic cross-section data, where economic units interact with each other and omitted variables decrease the level of confidence in econometric analysis. Since inference from the SAR model is conditional on the weight matrix W, which has to be exogenously specified, explicitly accounting for this source of model uncertainty is a natural generalization to uncertainty in the nature of Xk in the framework of BMA. In most applications there is little theoretical guidance on which structure to put on the weight matrix rendering its specification a serious challenge.

Spatial Filtering

The spatial filtering literature seeks to remove residual spatial autocorrelation patterns prior to estimation and is in principle not interested in directly estimating ρ in equation (1). The approach put forward by Getis and Griffith (2002) and Tiefelsdorf and Griffith (2007), is based on an eigenvector decomposition of a transformed W matrix, where the transformation depends on the underlying spatial model. The eigenvectors ei are included as additional explanatory variables and the regression equation (1) becomes:

Equation 2: y = αιN + (i=1)E ηiei + Xk χk + σ ε

where each eigenvector ei spans one of the spatial dimensions. By introducing the eigenvectors into the regression, we explicitly take care of (remaining) spatial patterns in the residuals. Furthermore spatial commonalities among the covariates in Xk are conditioned out. This reduces the degree of multicollinearity and further separates spatial effects from the ‘intrinsic’ impact the employed regressors exert on the dependent variable.

BMA with uncertain spatial effects

From a Bayesian perspective, the problem of obtaining estimates of the parameter associated with a covariate under uncertainty in both the nature of W and Xk can be handled in a straightforward manner using spatial filtering techniques. Let us assume that we are interested in a particular regression coefficient, β. Denote the set of potential models by M={M11, M12,…, M12^K, … M21, …, M22^K,…, MZ1,…, MZ2^K }, where K stands for the number of potential explanatory variables and Z the number of candidate spatial weighting matrices Wz, z=1,…, Z each with associated set of eigenvectors Ez. The cardinality of M is therefore 2K*Z. A particular model, say Mzk, is characterized by its parameter vector θzk=(α, χk, ηz) corresponding to the intercept term included in all models, the coefficients on the regressors entering the model and the coefficients on the set of eigenvectors Ez related to Wz. In the BMA framework, the posterior distribution of β takes now the form of

Equation 3: p(β|y)=(2^K)(j=1)(z=1)Z p(β|Mzj,y)p(Mzj|y)

with y denoting the data and β the coefficient of interest. Inference on β is based on single inferences under models j=1,…,2(Z*K) weighted by their respective posterior model probabilities, p(Mzj| y), which in turn depend on the corresponding matrix of spatial weights. For more technical details please see Crespo Cuaresma and Feldkircher (2010).

An Illustration: The Boston Housing Data

In R we will do spatial filtering with the ‘Boston housing data’ which has been originally published in Harrison and Rubinfeld (1978). The dependent variable (CMEDV) contains the corrected median value of owner-occupied homes in USD 1000’s for 506 observations. Among the explanatory variables we have per capita crime (CRIM), the pupil-teacher ratio by town (PTRATIO), the proportion of owner-occupied units built prior to 1940 (AGE), the proportion of non-retail business acres per town (INDUS), a variable that is proportional to the share of Afro Americans per town (B) and a variable reflecting the nitric oxides concentration (NOX) among others. For more details please see ?dataBoston. We start with loading the data in R :


As in the earlier analyses of these data, we take logarithms of the variables CMEDV, DIS, RAD and LSTAT and squares of the regressors RM (RM#RM) and NOX (NOX#NOX) to model potential non-linearities. These transformations have been already carried out and the transformed variables stored in dataBoston

We proceed with the construction of several weight matrices. To keep the example simple, we limit the space of candidate W matrices to five:

  1. The matrix boston.soi already comes along with the boston data set (W0).
  2. A perturbation of the boston.soi matrix (W1).
  3. A second perturbation of the boston.soi matrix (W2).
  4. A 4 nearest neighbor matrix (KNN4).
  5. A 6 nearest neighbor matrix (KNN6).

The first matrix is included in the spdep package and constitutes probably a first order contiguity matrix. This class of matrices assigns positive (identical) weights to observations that share a common border. See Anselin (1988) and Crespo Cuaresma and Feldkircher (2010) for an interpretation of different coding schemes. For the sake of illustration we will randomly perturb the matrix using the function jitterW_binary which is provided in the corresponding Sweave file to this pdf.

The perturbation randomly adds / drops N=5 neighborhood relationships of the boston.soi weight matrix. The function takes as argument a matrix object, thus we have to transform boston.soi with the function nb2mat into a matrix.

 W0 <- boston.soi
 bostMat = nb2mat(boston.soi, style = "B")
 W1 = jitterW_binary(bostMat, N = 5)
 W1 = mat2listw(W1$Wmat)
 W1 = W1$neighbours
 W2 = jitterW_binary(bostMat, N = 5)
 W2 = mat2listw(W2$Wmat)
 W2 = W2$neighbours

It is necessary to re-transform the perturbed matrices into the nb class since the SpatialFiltering function we will use later on only allows for objects of this class as inputs. This can be done with the function mat2listw and extracting the neighborhood object by typing object$neighbours (in the example W2$neighbours).

Finally we set up two k-nearest neighbor matrices.

 coords = coordinates(boston.utm)
 col.knn4 = knearneigh(coords, k = 4)
 col.knn4 = knn2nb(col.knn4)
 coords = coordinates(boston.utm)
 col.knn6 = knearneigh(coords, k = 6)
 col.knn6 = knn2nb(col.knn6)

As stated above we assume the model follows a SAR process. To free the residuals from spatial correlation we now filter the dependent variable. The SpatialFiltering function provided in the package spdep will extract the eigenvectors. A linear combination of these eigenvectors will allow us to separate spatial correlation from the dependent variable by using the identified eigenvectors as additional regressors in our econometric model.

Depending on the size of your W matrix, spatial filtering can take some while. The function takes the neighborhood objects we have defined above and the data to be filtered as main arguments.
For more details see ?SpatialFiltering . Note also that we have set ExactEV to FALSE (quicker) which provides an approximation for our illustration example.

 y =[, 1, drop = F])
 yFilt.colGal0 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W0, style = "W", ExactEV = FALSE)
 yFilt.colGal1 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W1, style = "W", ExactEV = FALSE)
 yFilt.colGal2 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = W1, style = "W", ExactEV = FALSE)
 yFilt.knn4 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = col.knn4, style = "W", ExactEV = FALSE)
 yFilt.knn6 = SpatialFiltering(dataM[, 1] ~ 1, ~-1, data = y, nb = col.knn6, style = "W", ExactEV = FALSE)

Finally we collect the eigenvectors in a list. = list(Col0 = fitted(yFilt.colGal0), Col1 = fitted(yFilt.colGal1), Col2 = fitted(yFilt.colGal1), knn4 = fitted(yFilt.knn4), knn6 = fitted(yFilt.knn6))

Note that you can also access the extracted eigenvectors by typing e.g. yFilt.colGal0$dataset instead of e.g. fitted(yFilt.colGal0).

Applying BMA

Now we can start with the BMA part. All functions and input arguments of the BMS library are applicable.
There are two important exceptions though: First, the empirical Bayes estimation as well as the hyper-g priors are so far not implemented. Thus you can specify the g-prior either by using the benchmark priors (Fernandez et al. 2001) or by any numerical number g>0. See Feldkircher and Zeugner (2009) for more details on the influence of g on posterior results. Secondly, the enumeration algorithm (mcmc=enum) is in its current version not available for Finally, to speed up calculations BMS provides the option force.full.ols=FALSE, which is not available in

The additional argument WList of the function must be a list object with length corresponding to the number of weight matrices you use length(WList) must be greater than 1, that is you have to submit at least two sets of eigenvectors in order to use spatial filtering in the context of BMA. Each element of the list contains a matrix with the extracted eigenvectors, where the matrices do not have to have the same column dimension. In the example we have collected the eigenvectors in the object To have a quick look at the boston data set we run a short BMA chain with 1 million posterior draws (iter=1e06) after discarding the first 100,000 draws (burn=1e05). For more information regarding the other function arguments type ?

 dataM = as.matrix(apply(dataM, 2, as.numeric))
 model1 = = dataM, WList =, burn = 1e+05, iter = 1e+06, nmodel = 100, mcmc = "bd", g = "bric", mprior = "random", 
mprior.size =(ncol(dataM)-1)/2)

The object model1 is a standard BMS object. It is important, to note that aall the reported statistics (Posterior Inclusion Probabilities, Posterior Means, Posterior Standard deviations, etc.) are after having integrated out uncertainty with respect to W. To fix ideas, we will look at the disaggregated results to - for example - assess whether avariable receives only posterior support under a particular weight matrix or to look at theposterior inclusion probabilites of the spatial weight matrices, first:

 model1$wTopModels[, 1:5]

               08beb0    08b6b0     08feb0     09beb0     08bef0
CRIM        1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
ZN          0.0000000 0.0000000 0.00000000 0.00000000 0.00000000
INDUS       0.0000000 0.0000000 0.00000000 0.00000000 0.00000000
CHAS        0.0000000 0.0000000 0.00000000 1.00000000 0.00000000
AGE         1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
DIS         0.0000000 0.0000000 1.00000000 0.00000000 0.00000000
RAD         1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
TAX         1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
PTRATIO     1.0000000 0.0000000 1.00000000 1.00000000 1.00000000
B           1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
LSTAT       1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
NOX         0.0000000 0.0000000 0.00000000 0.00000000 0.00000000
RM          1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
NOX#NOX     0.0000000 0.0000000 0.00000000 0.00000000 1.00000000
RM#RM       1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
W-Index     1.0000000 1.0000000 1.00000000 1.00000000 1.00000000
PMP (Exact) 0.4700282 0.1026632 0.04539982 0.04329733 0.03871103
PMP (MCMC)  0.4676800 0.1037210 0.04752609 0.04262372 0.03858353

In the example we show posterior results for the first 5 models. The line W-Index tells you which W has been included in the particular model. In our example the W-Index indicates that the first weight matrix in (i.e. W0='boston.soi') has been used in the first 5 regression models. On the contrary:

topmodels.bma(model1)[, 1:3]

             45f5     45b5      47f5
CRIM        1.000000 1.0000000 1.00000000
ZN          0.000000 0.0000000 0.00000000
INDUS       0.000000 0.0000000 0.00000000
CHAS        0.000000 0.0000000 0.00000000
AGE         1.000000 1.0000000 1.00000000
DIS         0.000000 0.0000000 1.00000000
RAD         1.000000 1.0000000 1.00000000
TAX         1.000000 1.0000000 1.00000000
PTRATIO     1.000000 0.0000000 1.00000000
B           1.000000 1.0000000 1.00000000
LSTAT       1.000000 1.0000000 1.00000000
NOX         0.000000 0.0000000 0.00000000
RM          1.000000 1.0000000 1.00000000
NOX#NOX     0.000000 0.0000000 0.00000000
RM#RM       1.000000 1.0000000 1.00000000
PMP (Exact) 0.465327 0.1056594 0.04672478
PMP (MCMC)  0.463265 0.0999770 0.04399500

shows the aggregated results: a matrix of containing the best models along with the according posterior model probabilities (exact and frequencies) after integrating out uncertainty with respect to the weight matrices. That is, if the first two models would be the same in terms of explanatory variables but differ regarding the employed weight matrix, the posterior mass of the two models is aggregated.


             PIP     Post Mean      Post SD Cond.Pos.Sign Idx
B       1.000000  5.662607e-04 9.677364e-05    1.00000000  10
LSTAT   1.000000 -1.794610e-01 1.951711e-02    0.00000000  11
CRIM    0.999997 -4.484776e-03 8.450216e-04    0.00000000   1
RM#RM   0.999995  3.755742e-02 6.807775e-03    1.00000000  15
RM      0.997460 -3.609063e-01 8.758833e-02    0.00000501  13
AGE     0.990832 -1.486823e-03 4.082490e-04    0.00000000   5
RAD     0.973028  6.094649e-02 1.926394e-02    1.00000000   7
TAX     0.943864 -2.723645e-04 1.056064e-04    0.00000000   8
PTRATIO 0.832582 -1.114472e-02 6.486644e-03    0.00000000   9
DIS     0.107883 -4.118105e-03 1.944304e-02    0.00549670   6
CHAS    0.089263 -1.408996e-03 8.414781e-03    0.00000000   4
NOX#NOX 0.085315 -8.218984e-03 9.314296e-02    0.02434507  14
INDUS   0.081432 -4.726262e-05 6.037751e-04    0.06234650   3
NOX     0.081259  2.521705e-03 1.183834e-01    0.23021450  12
ZN      0.078215  4.135538e-06 1.114933e-04    0.82154318   2

In the same vein, the posterior inclusion probability (PIP) for the variable RM for example corresponds to the sum of the posterior model probabilities of all regression models including that variable and the posterior mean is calculated as the weighted (by posterior model probabilities) average of posterior means over all weight matrices. Also note that the prior over the W space is uniform.

Coming back to the disaggregated (W-specific) results. To calculate the posterior inclusion probabilities of the W matrices, you can look a the frequency with which each W matrix has been visited by the sampler and express this in percentages:

  Col0   Col1   Col2   knn4   knn6 
992344   3773   3883      0      0 

model1$Wcount/sum(model1$Wcount) * 100
Col0    Col1    Col2    knn4    knn6 
99.2344  0.3773  0.3883  0.0000  0.0000 

In the example, the original 'boston.soi' W matrix along with its perturbations receives overwhelming posterior support, whereas the k-nn matrices cannot explain the spatial patterns present in the data. If you prefer statistics based on the best models (in the example we have set nmodel=100 meaning that results are based on a maximum of 100 models receiving highest posterior support in terms of posterior model probabilities), you can use the function pmpW.bma:

     PMP (Exact) PMP (MCMC)
Col0  99.5443044 99.5790237
Col1   0.2278478  0.2064531
Col2   0.2278478  0.2145232
knn4   0.0000000  0.0000000
knn6   0.0000000  0.0000000

Usually model1$Wcount gives you a reasonable approximation and is directly accessible from the model1 object.

We finally want to check whether there is remaining spatial autocorrelation present in the residuals. For that purpose we can use the 'Moran's I' test calling lm.morantest for the best models. Although - in order to save time - we could have a look at the 10 best models only (in the example the first 10 models already account for more than 80% of posterior mass) we carry out the residual test for all models in order to get more accurate results. The function mTest is wrapper for the function lm.morantest provided in the package spdep. The loop re-computes the nmodel best - in terms of posterior model probabilities - regressions in order to get an object of class lm and finally applies the lm.morantest.

We do this once for the eigenvector augmented regressions (the spatial filtering approach) and once in a pure OLS fashion by setting the option variants="double":

 mTest = moranTest.bma(object = model1, variants = "double", W = nb2listw(boston.soi), nmodel = 100)

This allows us for a direct comparison of a non-spatial regression approach and the spatial filtering BMA approach pursued in this article. If the non-spatial linear regression models do not show any patterns of spatial residual autocorrelation a standard BMA regression - as with the function bms - dealing solely with uncertainty with respect to the explanatory variables might be preferable.

We now extract the corresponding p-values (remember the null hypothesis corresponds to no spatial autocorrelation):

pvalueMat = cbind(sapply(mTest$moran, function(x) x$p.value), sapply(mTest$moranEV, function(x) x$p.value))

Figure 1 shows the distribution of the p-values of the 'Moran's I' test, once for pure OLS regressions (without any spatial correction, left panel) and once augmented with the eigenvectors identified by the spatial filtering algorithm (right panel).

Distribution of p-values of a Moran's I test for spatial residual autocorrelation. Left panel shows the results for the 100 best models based on OLS regressions. Right panel includes on top of the explanatory variables the eigenvectors identified by the spatial filtering algorithm.

To sum up by incorporating the eigenvectors we successfully removed spatial residual autocorrelation from the regressions. The BMA framework helped us to explicitely deal with uncertainty stemming from the construction of the weight matrices and allowed us to get posterior inclusion probabilities of weighting matrices as well as the usual BMA statistics dealing with spatial correlation. Finally, once again please note that all functions of the BMA package BMS are fully applicable to its spatial filtering variant. The remainder of this illustration shows posterior plots to assess convergence of the MCMC sampler, the posterior distribution of coefficients of interest, an image plot as well as a plot of the posterior model size. You can also type spatBMS.demo to get a short demonstration of's main functions.

The top panel shows the posterior distribution of the model size indicating that a model explaining house prices contains on   bottom panel a convergence plot. The correlation of 0.99 indicates excellent convergence of the MCMC algorithm.


  • Anselin, Luc, 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers.
  • Crespo Cuaresma, Jesús and Feldkircher, Martin. 2010. Spatial Filtering, Model Uncertainty and the Speed of Income Convergence in Europe. Working Papers 160, Oesterreichische Nationalbank.
  • Feldkircher, Martin and Zeugner, Stefan. 2009. Benchmark Priors Revisited: On Adaptive Shrinkage and the Supermodel Effect in Bayesian Model Averaging.IMF Working Paper, WP/09/202.
  • Fernández, Carmen and Ley, Eduardo and Steel, Mark F.J.. 2001. Benchmark Priors for Bayesian Model Averaging Journal of Econometrics, 100, 381-427.
  • Getis, Arthur and Griffth, Daniel A. 2002. Comparative Spatial Filtering in Regression Analysis. Geographical Analysis, 34:2, 130-140.
  • Harrison, D. and Rubinfeld, D.L. 1978. Hedonic prices and the demand for clean air. Journal of Environmental Economics & Management, Vol.5, 81-102.
  • Tiefelsdorf, Michael and Griffith, Daniel 2007. Semiparametric Filtering of spatial auto-correlation: the eigenvector approach. Environment and Planning A, 37, 1193-1221.


To leave a comment for the author, please follow the link and comment on their blog: BMS Add-ons » BMS Add-ons. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)