Fitting robust non-Gaussian models in Stan and R-INLA

YoungStatS

1 day ago

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Overview

Traditionally the excitation noise of spatial and temporal models is Gaussian. Take, for instance, an AR1 (autoregressive of order 1) process, where the increments \(x_{i+1}-\rho x_i, \ \ |\rho|<1\) are assumed to follow a Gaussian distribution. However, it is easy to find datasets that contain inherently non-Gaussian features, such as sudden jumps or spikes, that adversely affect the inferences and predictions made from Gaussian models. In this post, we introduce a specific class of non-Gaussian models, their advantages over Gaussian models, and their Bayesian implementation in Stan and R-INLA, two well-established platforms for statistical modeling.

Why go beyond Gaussian models?

Often, these Gaussian and non-Gaussian models are used as latent components in hierarchical models leading to latent Gaussian models (LGMs) and latent non-Gaussian models (LnGMs), respectively. We highlight next the benefits of using a (latent) non-Gaussian model:

More accurate predictions. Asar et al. (2020) considered measurements related to the kidney function of several patients recorded over time. The LGM did not adapt well to sudden drops in measurements, as shown in the following figure, which was problematic since these drops are an example of “acute kidney injury”, which should prompt an immediate medical intervention. The red curve shows a prediction based on an LnGM, which clearly is more accurate.


Observations of a quantity related to a patient’s kidney function and predictions using an LGM (green) and an LnGM (red).

Robustness. Non-Gaussian models provide a means for both accommodating possible outliers in the data and reduce their impact on the inferences (West (1984))
Model checking. We often elicit a Gaussian model for simplicity and convenience in a somewhat casual fashion. However, we can fit a non-Gaussian model, and if the relevant inferences do not change significantly, this can confirm the reasonableness of the Gaussian assumption.

What type of non-Gaussian models are you considering?

If \(\mathbf{x}^G\) follows a multivariate Gaussian with mean \(\mathbf{0}\) and precision matrix \(\mathbf{Q}= \mathbf{\mathbf{D}}^T\mathbf{D}\), it can be expressed through

\[ \mathbf{D}\mathbf{x}^G\overset{d}{=} \sigma\mathbf{Z}, \] where \(\mathbf{Z}\) is a vector of i.i.d. standard Gaussian variables. The non-Gaussian extension for \(\mathbf{x}^G\) consists in replacing the driving noise distribution: \[ \mathbf{D}\mathbf{x}\overset{d}{=} \sigma \mathbf{\Lambda}(\eta,\zeta), \] where \(\boldsymbol{\Lambda}\) is a vector of independent and standardized normal-inverse Gaussian (NIG) random variables that depend on the parameter \(\eta\) and \(\zeta\), which controls the leptokurtosis and skewness of the driving noise. Cabral, Bolin, and Rue (2022a) presented these models as flexible extensions of Gaussian models because they contain the Gaussian model as a special case (when \(\eta=0\)), and deviations from the Gaussian model are quantified by the parameters \(\eta\) and \(\zeta\). These models admit a useful variance-mean mixture representation. Considering the mixing variables \(\mathbf{V}= [V_1, \dotsc, V_n]^T\) we have:

\[\begin{equation} \label{eq:model} \tag{1} \mathbf{x}|\mathbf{V} \sim \text{N}\left(\frac{\sigma}{\sqrt{1+\zeta^2\eta}} \zeta \mathbf{D}^{-1}(\mathbf{V}-\mathbf{1}),\frac{\sigma^2}{1+\zeta^2\eta} \mathbf{D}^{-1}\text{diag}(\mathbf{V})\mathbf{D}^{-T} \right), \ \ \ V_{i}|\eta \overset{ind.}{\sim} \mathrm{IG}(1,\eta^{-1} h_i^2), \end{equation}\]

where \(\mathrm{IG}\) stands for the inverse-Gaussian distribution. We condition the Gaussian latent field \(\mathbf{x}\) on a vector of mixing variables \(\mathbf{V}\), which enter the covariance matrix, adding more flexibility to the model. We note that the previous parameterization preserves the mean and covariance structure of \(\mathbf{x}\) while allowing for non-zero skewness and excess kurtosis on the marginal distributions.

This non-Gaussian extension can be used in: random walk (RW) and autoregressive processes for time series; simultaneous and conditional autoregressive processes for graphical models and areal data; Matérn processes, which can be used in a variety of applications, such as in geostatistics and spatial point processes. The corresponding dependency matrices \(\mathbf{D}\) that specify each model can be found in Cabral, Bolin, and Rue (2022b).

What sample paths do these models produce?

The first row of the following figure shows Gaussian (left) and NIG (right) white noise processes. The rows below show several processes built from these driving noises, including RW1, RW2, Ornstein–Uhlenbeck (OU), and Matérn (\(\alpha=2\)) processes. Whenever the NIG noise takes an extreme value (for instance, near location 0.8), the CRW1 and OU processes will exhibit a distinct jump, and the RW2 and Matérn processes will exhibit a kink (discontinuity in the first derivative).


Noise and sample paths of several models for \(\eta=10^{-6}\) (left) and \(\eta=1\) (right), for \(\sigma=1\) and \(\zeta=0\).

Stan implementation

The Gaussian model (\(\mathbf{D}\mathbf{x}^G = \mathbf{Z}\)) can be declared in Stan as:

x ~ multi_normal_prec(rep_vector(0,N), D'*D)

The non-Gaussian model (\(\mathbf{D}\mathbf{x} = \mathbf{\Lambda}\)) declaration is:

x ~ nig_model(D, eta, zeta, h, 1)

The argument h should be a vector of ones for discrete-space models. For models defined in continuous space, for instance, a continuous random walk of order 1, h contains the distance between locations. The last argument is an integer with value 1 if the log-determinant of \(\mathbf{D}\) should be computed (if \(\mathbf{D}\) depends on parameters), or 0 otherwise. The nig_model and other Stan functions can be found in github.com/rafaelcabral96/nigstan, along with a more comprehensive theoretical background and several code examples for time series, geostatistical and areal data application. The function nig_model uses a collapsed representation of (1) and leverages within-chain parallelization to improve speed and scalability.

Fitting a non-Gaussian model in Stan (Columbus dataset)

The Columbus dataset consists of crime rates in thousands (\(y_i\)) in 49 counties of Columbus, Ohio, and can be found in the spdep R package. The next Leaflet widget shows the crime rates.

To leave a comment for the author, please follow the link and comment on their blog: YoungStatS.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.