Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Some time ago I started working with Bayesian methods, using the great rstanarm-package. Beside the fantastic package-vignettes, and books like Statistical Rethinking or Doing Bayesion Data Analysis, I also found the ressources from Tristan Mahr helpful to both better understand Bayesian analysis and rstanarm. This motivated me to implement tools for Bayesian analysis into my packages, as well.

Due to the latest tidyr-update, I had to update some of my packages, in order to make them work again, so – beside some other features – some Bayes-stuff is now avaible in my packages on CRAN.

## Finding shape or location parameters from distributions

The following functions are included in the sjstats-package. Given some known quantiles or percentiles, or a certain value or ratio and its standard error, the functions find_beta(), find_normal() or find_cauchy() help finding the parameters for a distribution. Taking the example from here, the plot indicates that the mean value for the normal distribution is somewhat above 50. We can find the exact parameters with find_normal(), using the information given in the text:

library(sjstats)
find_normal(x1 = 30, p1 = .1, x2 = 90, p2 = .8)
#> $mean #>  53.78387 #> #>$sd
#>  30.48026

## High Density Intervals for MCMC samples

The hdi()-function computes the high density interval for posterior samples. This is nothing special, since there are other packages with such functions as well – however, you can use this function not only on vectors, but also on stanreg-objects (i.e. the results from models fitted with rstanarm). And, if required, you can also transform the HDI-values, e.g. if you need these intervals on an expontiated scale.

library(rstanarm)
fit <- stan_glm(mpg ~ wt + am, data = mtcars, chains = 1)
hdi(fit)
#>          term   hdi.low  hdi.high
#> 1 (Intercept) 32.158505 42.341421
#> 2          wt -6.611984 -4.022419
#> 3          am -2.567573  2.343818
#> 4       sigma  2.564218  3.903652

# fit logistic regression model
fit <- stan_glm(
vs ~ wt + am,
data = mtcars,
family = binomial("logit"),
chains = 1
)
hdi(fit, prob = .89, trans = exp)
#>          term      hdi.low     hdi.high
#> 1 (Intercept) 4.464230e+02 3.725603e+07
#> 2          wt 6.667981e-03 1.752195e-01
#> 3          am 8.923942e-03 3.747664e-01

## Marginal effects for rstanarm-models

The ggeffects-package creates tidy data frames of model predictions, which are ready to use with ggplot (though there’s a plot()-method as well). ggeffects supports a wide range of models, and makes it easy to plot marginal effects for specific predictors, includinmg interaction terms. In the past updates, support for more model types was added, for instance polr (pkg MASS), hurdle and zeroinfl (pkg pscl), betareg (pkg betareg), truncreg (pkg truncreg), coxph (pkg survival) and stanreg (pkg rstanarm).

ggpredict() is the main function that computes marginal effects. Predictions for stanreg-models are based on the posterior distribution of the linear predictor (posterior_linpred()), mostly for convenience reasons. It is recommended to use the posterior predictive distribution (posterior_predict()) for inference and model checking, and you can do so using the ppd-argument when calling ggpredict(), however, especially for binomial or poisson models, it is harder (and much slower) to compute the „confidence intervals“. That’s why relying on posterior_linpred() is the default for stanreg-models with ggpredict().

Here is an example with two plots, one without raw data and one including data points:

library(sjmisc)
library(rstanarm)
library(ggeffects)
data(efc)
# make categorical
efc$c161sex <- to_label(efc$c161sex)

# fit model
m <- stan_glm(neg_c_7 ~ c160age + c12hour + c161sex, data = efc)
dat <- ggpredict(m, terms = c("c12hour", "c161sex"))
dat
#> # A tibble: 128 x 5
#>        x predicted conf.low conf.high  group
#>    <dbl>     <dbl>    <dbl>     <dbl> <fctr>
#>  1     4  10.80864 10.32654  11.35832   Male
#>  2     4  11.26104 10.89721  11.59076 Female
#>  3     5  10.82645 10.34756  11.37489   Male
#>  4     5  11.27963 10.91368  11.59938 Female
#>  5     6  10.84480 10.36762  11.39147   Male
#>  6     6  11.29786 10.93785  11.61687 Female
#>  7     7  10.86374 10.38768  11.40973   Male
#>  8     7  11.31656 10.96097  11.63308 Female
#>  9     8  10.88204 10.38739  11.40548   Male
#> 10     8  11.33522 10.98032  11.64661 Female
#> # ... with 118 more rows

plot(dat)
plot(dat, rawdata = TRUE)  As you can see, if you work with labelled data, the model-fitting functions from the rstanarm-package preserves all value and variable labels, making it easy to create annotated plots. The „confidence bands“ are actually hidh density intervals, computed with the above mentioned hdi()-function.

## Next…

Next I will integrate ggeffects into my sjPlot-package, making sjPlot more generic and supporting more models types. Furthermore, sjPlot shall get a generic plot_model()-function which will replace former single functions like sjp.lm(), sjp.glm(), sjp.lmer() or sjp.glmer(). plot_model() should then produce a plot, either marginal effects, forest plots or interaction terms and so on, and accept (m)any model class. This should help making sjPlot more convenient to work with, more stable and easier to maintain…

Tagged: Bayes, data visualization, ggplot, R, rstanarm, rstats, sjPlot, Stan  