The INLA Approach to Bayesian models

The Integrated Nested Laplace Approximation, or INLA, approach is a recently developed, computationally simpler method for fitting Bayesian models [(Rue et al., 2009, compared to traditional Markov Chain Monte Carlo (MCMC) approaches. INLA fits models that are classified as latent Gaussian models, which are applicable in many settings (Martino & Rue, 2010. In general, INLA fits a general form of additive models such as:

\(\eta = \alpha + \sum_{j=1}^{nf} f^{(j)}(u_{ij}) + \sum_{k=1}^{n\beta}\beta_k z_{ki} + \epsilon_i\)

where \(\eta\) is the linear predictor for a generalized linear model formula, and is composed of a linear function of some variables u, \(\beta\) are the effects of covariates, z, and \(\epsilon\) is an unstructured residual (Rue et al., 2009). As this model is often parameterized as a Bayesian one, we are interested in the posterior marginal distributions of all the model parameters. Rue and Martino (2007) show that the posterior marginal for the random effects (x) in such models can be approximated as:

\(\tilde{p}(x_i|y) = \sum_k \tilde{p}(x_i|\theta_k, y) \tilde{p}(\theta_k|y) \Delta_k\)

via numerical integration (Rue & Martino, 2007; Schrodle & Held, 2011a, 2011b). The posterior distribution of the hyperparameters (\(\theta\)) of the model can also be approximated as:

\(\tilde{p}(\theta | y)) \propto \frac{p(x, \theta, y)}{\tilde{p}G(x| \theta,y)} \mid _{x} = x^*(\theta)\)

, where G is a Gaussian approximation of the posterior and \(x^*(\theta)\) is the mode of the conditional distribution of \(p(x|\theta,y)\). Thus, instead of using MCMC to find an iterative, sampling-based estimate of the posterior, it is arrived at numerically. This method of fitting the spatial models specified above has been presented by numerous authors (Blangiardo & Cameletti, 2015; Blangiardo et al., 2013; Lindgren & Rue, 2015; Martins et al., 2013; Schrodle & Held, 2011a, 2011b), with comparable results to MCMC.

Data

I have the data on my github site under the nhgis_vs page. These are data from the NHGIS project by IPUMS who started providing birth and death data from the US Vital statistics program.

The data we will use here are infant mortality rates in US counties between 2000 and 2007.

##   YEAR cofips rate
## 1 2000  01001   34
## 2 2000  01003   61
## 3 2000  01005  125
## 4 2000  01007   70
## 5 2000  01009   89
## 6 2000  01011  242

Census intercensus population estimates

From the Census population estimates program

##   sumlev region division state county  stname        ctyname estimatesbase2000
## 1     50      3        6     1      1 Alabama Autauga County             43751
## 2     50      3        6     1      3 Alabama Baldwin County            140416
## 3     50      3        6     1      5 Alabama Barbour County             29042
## 4     50      3        6     1      7 Alabama    Bibb County             19856
## 5     50      3        6     1      9 Alabama  Blount County             50982
## 6     50      3        6     1     11 Alabama Bullock County             11603
##   popestimate2000 popestimate2001 popestimate2002 popestimate2003
## 1           44021           44889           45909           46800
## 2          141342          144875          147957          151509
## 3           29015           28863           28653           28594
## 4           19913           21028           21199           21399
## 5           51107           51845           52551           53457
## 6           11581           11358           11256           11316
##   popestimate2004 popestimate2005 popestimate2006 popestimate2007
## 1           48366           49676           51328           52405
## 2          156266          162183          168121          172404
## 3           28287           28027           27861           27757
## 4           21721           22042           22099           22438
## 5           54124           54624           55485           56240
## 6           11056           11011           10776           11011
##   popestimate2008 popestimate2009 census2010pop popestimate2010 cofips
## 1           53277           54135         54571           54632  01001
## 2          175827          179406        182265          183195  01003
## 3           27808           27657         27457           27411  01005
## 4           22705           22941         22915           22867  01007
## 5           57055           57341         57322           57338  01009
## 6           10953           10987         10914           10890  01011

Data prep

##         sumlev        ctyname cofips time population year
## 01001.1     50 Autauga County  01001    1      44021 2000
## 01003.1     50 Baldwin County  01003    1     141342 2000
## 01005.1     50 Barbour County  01005    1      29015 2000
## 01007.1     50    Bibb County  01007    1      19913 2000
## 01009.1     50  Blount County  01009    1      51107 2000
## 01011.1     50 Bullock County  01011    1      11581 2000
##   cofips year sumlev        ctyname time population rate
## 1  01001 2000     50 Autauga County    1      44021   34
## 2  01001 2001     50 Autauga County    2      44889   78
## 3  01001 2002     50 Autauga County    3      45909   83
## 4  01001 2003     50 Autauga County    4      46800   79
## 5  01001 2004     50 Autauga County    5      48366   76
## 6  01001 2005     50 Autauga County    6      49676  124

Get census data using tidycensus

Here I get data from the 2000 decennial census summary file 3

## Getting data from the 2000 decennial Census
##   cofips year sumlev        ctyname time population rate GEOID           NAME
## 1  01001 2006     50 Autauga County    7      51328   93 01001 Autauga County
## 2  01001 2003     50 Autauga County    4      46800   79 01001 Autauga County
## 3  01001 2004     50 Autauga County    5      48366   76 01001 Autauga County
## 4  01001 2005     50 Autauga County    6      49676  124 01001 Autauga County
## 5  01001 2000     50 Autauga County    1      44021   34 01001 Autauga County
## 6  01001 2007     50 Autauga County    8      52405   83 01001 Autauga County
##   P007003 P007004 P007010 P053001 P089001 P089002 summary_value    pwhite
## 1   34760    7450     394   42013   43377    4738         43671 0.7959515
## 2   34760    7450     394   42013   43377    4738         43671 0.7959515
## 3   34760    7450     394   42013   43377    4738         43671 0.7959515
## 4   34760    7450     394   42013   43377    4738         43671 0.7959515
## 5   34760    7450     394   42013   43377    4738         43671 0.7959515
## 6   34760    7450     394   42013   43377    4738         43671 0.7959515
##      pblack       phisp  medhhinc      ppov
## 1 0.1705938 0.009022005 0.7593459 0.1092284
## 2 0.1705938 0.009022005 0.7593459 0.1092284
## 3 0.1705938 0.009022005 0.7593459 0.1092284
## 4 0.1705938 0.009022005 0.7593459 0.1092284
## 5 0.1705938 0.009022005 0.7593459 0.1092284
## 6 0.1705938 0.009022005 0.7593459 0.1092284

Create expected numbers of cases

In count data models, and spatial epidemiology, we have to express the raw counts of events relative to some expected value, or population offset, see this Rpub for a reminder.

##   cofips year sumlev        ctyname time population rate GEOID           NAME
## 5  01001 2000     50 Autauga County    1      44021   34 01001 Autauga County
## 8  01001 2001     50 Autauga County    2      44889   78 01001 Autauga County
## 7  01001 2002     50 Autauga County    3      45909   83 01001 Autauga County
## 2  01001 2003     50 Autauga County    4      46800   79 01001 Autauga County
## 3  01001 2004     50 Autauga County    5      48366   76 01001 Autauga County
## 4  01001 2005     50 Autauga County    6      49676  124 01001 Autauga County
##   P007003 P007004 P007010 P053001 P089001 P089002 summary_value    pwhite
## 5   34760    7450     394   42013   43377    4738         43671 0.7959515
## 8   34760    7450     394   42013   43377    4738         43671 0.7959515
## 7   34760    7450     394   42013   43377    4738         43671 0.7959515
## 2   34760    7450     394   42013   43377    4738         43671 0.7959515
## 3   34760    7450     394   42013   43377    4738         43671 0.7959515
## 4   34760    7450     394   42013   43377    4738         43671 0.7959515
##      pblack       phisp  medhhinc      ppov      E_d id
## 5 0.1705938 0.009022005 0.7593459 0.1092284 72.33683  1
## 8 0.1705938 0.009022005 0.7593459 0.1092284 72.33683  2
## 7 0.1705938 0.009022005 0.7593459 0.1092284 72.33683  3
## 2 0.1705938 0.009022005 0.7593459 0.1092284 72.33683  4
## 3 0.1705938 0.009022005 0.7593459 0.1092284 72.33683  5
## 4 0.1705938 0.009022005 0.7593459 0.1092284 72.33683  6

Next we make the spatial information, we get the polygons from census directly using counties from the tigris package. We drop counties not in the contiguous 48 US states.

Construction of spatial relationships:

Contiguity based neighbors

In a general sense, we can think of a square grid. Cells that share common elements of their geometry are said to be “neighbors”. There are several ways to describe these patterns, and for polygons, we generally use the rules of the chess board.

Rook adjacency Neighbors must share a line segment

Queen adjacency Neighbors must share a vertex or a line segment

If polygons share these boundaries (based on the specific definition: rook or queen), they are said to be “spatial neighbors” of one another. The figure below illustrates this principle.

For an observation of interest, the pink area, the Rood adjacent areas are those in green in the figure, because they share a line segment. For the second part of the figure on the right, the pink area has different sets of neighbors, compared to the Rook rule neighbors, because the area also shares vertices with other polygons, making them Queen neighbors.

Adjacency using Chessboard Rules

Adjacency using Chessboard Rules

Order of adjacency

The figure above also highlights the order of adjacency among observations. By order of adjacency, we simply men that observations are either immediate neighbors (the green areas), or they are neighbors of immediate neighbors. These are referred to as first and second order neighbors.

So, we can see, that the yellow polygons are the neighboring areas for this tract, which allows us to think about what the spatial structure of the area surrounding this part of campus.

For an example, let’s consider the case of San Antonio again. If our data are polygons, then there is a function in the spdep library in R, poly2nb that will take a polygon layer and find the neighbors of all areas using either a queen or rook rule. First we form the neighbors using the rook rule for all the tracts in Bexar County.

Distance based association

The queen and rook rules are useful for polygon features, but distance based contiguity is useful for all feature types (points, polygons, lines). The idea is similar to the polygon adjacency rule from above, but the distance rule is based on the calculated distance between areas. There are a variety of distance metrics that are used in statistics, but the most commonly assumed one is the Euclidean distance. The Euclidean distance between any two points is:

\[D^2 = \sqrt{\left (x_1 – x_2 \right)^2 + \left (y_1 – y_2 \right)^2 } \] Where x and y are the coordinates of each of the two areas. For polygons, these coordinates are typically the centroid of the polygon (you may have noticed this above when we were plotting the neighbor lists), while for point features, these are the two dimensional geometry of the feature. The collection of these distances between all features forms what is known as the distance matrix between observations. This summarizes all distances between all features in the data.

K nearest neighbors

  • A useful way to use distances is to construct a k-nearest neighbors set.

  • This will find the “k” closest observations for each observation, where k is some integer.

  • For instance if we find the k=3 nearest neighbors, then each observation will have 3 neighbors, which are the closest observations to it, regardless of the distance between them which is important.

  • Using the k nearest neighbor rule, two observations could potentially be very far apart and still be considered neighbors.

Model setup

  • We have a count outcome (deaths and births), in counties over time, and a set of time-constant covariates.
  • We have several options in the GLM framework with which to model these data, for example:

  • Binomial – \[y_{ij} \sim Bin(\pi_{ij}) \text{: } logit(\pi_{ij} ) = \beta_{0}+ x'\beta_k \]

  • Poisson – \[y_{ij} \sim Pois(\lambda_{ij} E_{ij}) \text{: } log(\lambda_{ij} ) = log(E_{ij}) + \beta_{0}+ x'\beta_k \]

  • Negative Binomial – \[y_{ij} \sim \text{Neg Bin} (\mu_{ij}, \alpha, E_{ij}) \text{: } log(\mu_{ij} ) = log(E_{ij}) + \beta_{0}+ x'\beta_k \]

  • In addition to various zero-inflated versions of these data.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1849 rows containing non-finite values (stat_bin).

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1849 rows containing non-finite values (stat_bin).

## `summarise()` ungrouping output (override with `.groups` argument)

We can fit these model using the Bayesian framework with INLA.

First, we consider the basic GLM for the mortality outcome, with out any hierarchical structure. We can write this model as a Negative Binomial model, for instance as:

\[\text{Deaths}_{ij} \sim NB(\mu_{ij}, \gamma)\] \[\mu_{ij} = \text{log(E_d)}_{ij} + X' \beta\]

INLA will use vague Normal priors for the \(\beta\)’s, and we have other parameters in the model to specify priors for. INLA does not require you to specify all priors, as all parameters have a default prior specification. In this example, I will use a \(Gamma(1, .5)\) prior for all hierarchical variance terms.

## 
## Call:
##    c("inla(formula = f1, family = \"nbinomial\", data = final.dat, E = 
##    E_d, ", " verbose = F, control.compute = list(waic = T), 
##    control.predictor = list(link = 1), ", " num.threads = 2)") 
## Time used:
##     Pre = 0.928, Running = 21.8, Post = 0.722, Total = 23.5 
## Fixed effects:
##                 mean     sd 0.025quant 0.5quant 0.975quant   mode kld
## (Intercept)   -5.047 10.723    -26.102   -5.048     15.989 -5.047   0
## scale(pblack)  0.159  0.015      0.130    0.159      0.188  0.159   0
## scale(phisp)  -0.025  0.013     -0.050   -0.025      0.001 -0.025   0
## scale(ppov)    0.041  0.015      0.012    0.041      0.070  0.041   0
## year           0.003  0.005     -0.008    0.003      0.013  0.003   0
## 
## Model hyperparameters:
##                                                         mean    sd 0.025quant
## size for the nbinomial observations (1/overdispersion) 0.624 0.009      0.608
##                                                        0.5quant 0.975quant
## size for the nbinomial observations (1/overdispersion)    0.624      0.641
##                                                         mode
## size for the nbinomial observations (1/overdispersion) 0.624
## 
## Expected number of effective parameters(stdev): 5.04(0.001)
## Number of equivalent replicates : 2124.92 
## 
## Watanabe-Akaike information criterion (WAIC) ...: 114586.38
## Effective number of parameters .................: 10.27
## 
## Marginal log-Likelihood:  -57331.80 
## Posterior marginals for the linear predictor and
##  the fitted values are computed

Plot our observed vs fitted values

Basic county level random intercept model

Now we add basic nesting of rates within counties, with a random intercept term for each county. This would allow there to be heterogeneity in the mortality rate for each county, over and above each county’s observed characteristics.

This model would be:

\[\text{Deaths}_{ij} \sim NB(\mu_{ij}, \gamma)\] \[\mu_{ij} = \text{log(E_d)}_{ij} + X' \beta + u_j\] \[u_j \sim \text{Normal} (0 , \tau_u)\]

where \(\tau_u\) here is the precision, not the variance and precision = 1/variance. INLA puts a log-gamma prior on the the precision by default.

## 
## Call:
##    c("inla(formula = f2, family = \"nbinomial\", data = final.dat, E = 
##    E_d, ", " verbose = F, control.compute = list(waic = T), 
##    control.predictor = list(link = 1), ", " num.threads = 2)") 
## Time used:
##     Pre = 0.571, Running = 160, Post = 1.36, Total = 162 
## Fixed effects:
##                 mean     sd 0.025quant 0.5quant 0.975quant   mode kld
## (Intercept)   -2.824 10.758    -23.945   -2.824     18.279 -2.824   0
## scale(pblack)  0.158  0.015      0.128    0.158      0.189  0.158   0
## scale(phisp)  -0.041  0.014     -0.069   -0.041     -0.013 -0.041   0
## scale(ppov)    0.044  0.015      0.014    0.044      0.074  0.044   0
## year           0.001  0.005     -0.009    0.001      0.012  0.001   0
## 
## Random effects:
##   Name     Model
##     struct IID model
## 
## Model hyperparameters:
##                                                          mean    sd 0.025quant
## size for the nbinomial observations (1/overdispersion)  0.627 0.009      0.609
## Precision for struct                                   50.626 7.005     38.292
##                                                        0.5quant 0.975quant
## size for the nbinomial observations (1/overdispersion)    0.627      0.644
## Precision for struct                                     50.138     65.780
##                                                          mode
## size for the nbinomial observations (1/overdispersion)  0.626
## Precision for struct                                   49.174
## 
## Expected number of effective parameters(stdev): 125.34(15.33)
## Number of equivalent replicates : 85.47 
## 
## Watanabe-Akaike information criterion (WAIC) ...: 114610.09
## Effective number of parameters .................: 66.26
## 
## Marginal log-Likelihood:  -57375.58 
## Posterior marginals for the linear predictor and
##  the fitted values are computed

Marginal Distributions of hyperparameters

We can plot the posterior marginal of the hyperparameter in this model, in this case \(\sigma_u = 1/\tau_u\)

##                   low       high
## level:0.95 0.01491462 0.02565338

## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

## TableGrob (2 x 1) "arrange": 2 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (2-2,1-1) arrange gtable[layout]

BYM Model

Model with spatial correlation – Besag, York, and Mollie (1991) model and temporal heterogeneity \[\text{Deaths}_{ij} \sim NB(\mu_{ij}, \gamma)\] \[\mu_{ij} = \text{log(E_d)}_{ij} + X' \beta + u_j + v_j + \gamma_t\]

Which has two random effects, one an IID random effect and the second a spatially correlated random effect, specified as a conditionally auto-regressive prior for the \(v_j\)’s. This is the Besag model:

\[v_j|v_{\neq j},\sim\text{Normal}(\frac{1}{n_i}\sum_{i\sim j}v_j,\frac{1}{n_i\tau})\] and \(u_j\) is an IID normal random effect, \(\gamma_t\) is also given an IID Normal random effect specification, and there are now three hyperparameters, \(\tau_u\) and \(\tau_v\) and \(\tau_{\gamma}\) and each are given log-gamma priors.

For the BYM model we must specify the spatial connectivity matrix in the random effect.

## 
## Call:
##    c("inla(formula = f3, family = \"nbinomial\", data = final.dat, E = 
##    E_d, ", " verbose = F, control.compute = list(waic = T), 
##    control.predictor = list(link = 1), ", " num.threads = 2)") 
## Time used:
##     Pre = 0.737, Running = 138, Post = 1.26, Total = 140 
## Fixed effects:
##                 mean    sd 0.025quant 0.5quant 0.975quant   mode kld
## (Intercept)    0.115 0.129     -0.145    0.115      0.374  0.115   0
## scale(pblack)  0.157 0.016      0.126    0.158      0.189  0.158   0
## scale(phisp)  -0.039 0.016     -0.069   -0.039     -0.007 -0.040   0
## scale(ppov)    0.043 0.016      0.012    0.043      0.075  0.043   0
## 
## Random effects:
##   Name     Model
##     struct BYM model
##    year IID model
## 
## Model hyperparameters:
##                                                            mean       sd
## size for the nbinomial observations (1/overdispersion)    0.627    0.009
## Precision for struct (iid component)                     51.094    7.099
## Precision for struct (spatial component)               1974.289 1903.577
## Precision for year                                        8.760    4.130
##                                                        0.025quant 0.5quant
## size for the nbinomial observations (1/overdispersion)      0.609    0.627
## Precision for struct (iid component)                       38.602   50.591
## Precision for struct (spatial component)                  174.447 1425.658
## Precision for year                                          2.885    8.075
##                                                        0.975quant    mode
## size for the nbinomial observations (1/overdispersion)      0.644   0.628
## Precision for struct (iid component)                       66.447  49.595
## Precision for struct (spatial component)                 7055.730 496.592
## Precision for year                                         18.742   6.583
## 
## Expected number of effective parameters(stdev): 133.75(15.30)
## Number of equivalent replicates : 80.09 
## 
## Watanabe-Akaike information criterion (WAIC) ...: 114605.76
## Effective number of parameters .................: 69.81
## 
## Marginal log-Likelihood:  -56934.15 
## Posterior marginals for the linear predictor and
##  the fitted values are computed

##                   low       high
## level:0.95 0.01475866 0.02544088
##                      low        high
## level:0.95 0.00005416961 0.003970123
##                   low      high
## level:0.95 0.03927999 0.2945931

This indicates very low spatially correlated variance in these data.

Exceedence probabilities

In Bayesian spatial models that are centered on an epidemiological type of outcome, it is common to examine the data for spatial clustering. One way to do this is to examine the clustering in the relative risk from one of these GLMM models. For instance if \(\theta\) is the relative risk \[\theta = exp(\beta_0 + \beta_1*x_1 + u_j)\] from one of our Negative binomial models above. We can use the posterior marginals of the relative risk to ask \(\theta \gt \theta^*\) where \(\theta^*\) is a specific level of excess risk, say 50% extra or \(\theta > 1.25\). If the density, or \(\text{Pr}(\theta \gt \theta^*)\) is high, then there is evidence that the excess risk is not only high, but significantly high.

To get the exceedence probabilities from one of our models, we can use the inla.pmarginal() function to ask if \(\text{Pr}(\theta \gt \theta^*)\)

So, we see lots of occasions where the exceedence probability is greater than .9. We can visualize these in a map.

Which shows several areas of the south where risk the infant mortality rate is signficantly higher than the national rate, with high posterior probability.

References

Besag, J., York, J., & Mollie, a. (1991). Bayesian image-restoration, with 2 applications in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(1), 1-20. https://doi.org/10.1007/BF00116466