# Going Bayes #rstats

**R – Strenge Jacke!**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Some time ago I started working with Bayesian methods, using the great rstanarm-package. Beside the fantastic package-vignettes, and books like *Statistical Rethinking* or *Doing Bayesion Data Analysis*, I also found the ressources from Tristan Mahr helpful to both better understand Bayesian analysis and rstanarm. This motivated me to implement tools for Bayesian analysis into my packages, as well.

Due to the latest tidyr-update, I had to update some of my packages, in order to make them work again, so – beside some other features – some Bayes-stuff is now avaible in my packages on CRAN.

## Finding shape or location parameters from distributions

The following functions are included in the sjstats-package. Given some known quantiles or percentiles, or a certain value or ratio and its standard error, the functions `find_beta()`

, `find_normal()`

or `find_cauchy()`

help finding the parameters for a distribution. Taking the example from here, the plot indicates that the mean value for the normal distribution is somewhat above 50. We can find the exact parameters with `find_normal()`

, using the information given in the text:

library(sjstats) find_normal(x1 = 30, p1 = .1, x2 = 90, p2 = .8) <span style="color:#999999;">#> $mean #> [1] 53.78387 #> #> $sd #> [1] 30.48026</span>

## High Density Intervals for MCMC samples

The `hdi()`

-function computes the high density interval for posterior samples. This is nothing special, since there are other packages with such functions as well – however, you can use this function not only on vectors, but also on `stanreg`

-objects (i.e. the results from models fitted with *rstanarm*). And, if required, you can also transform the HDI-values, e.g. if you need these intervals on an expontiated scale.

library(rstanarm) fit <- stan_glm(mpg ~ wt + am, data = mtcars, chains = 1) hdi(fit) <span style="color:#999999;">#> term hdi.low hdi.high #> 1 (Intercept) 32.158505 42.341421 #> 2 wt -6.611984 -4.022419 #> 3 am -2.567573 2.343818 #> 4 sigma 2.564218 3.903652</span> <span style="color:#999999;"># fit logistic regression model</span> fit <- stan_glm( vs ~ wt + am, data = mtcars, family = binomial("logit"), chains = 1 ) hdi(fit, prob = .89, trans = exp) <span style="color:#999999;">#> term hdi.low hdi.high #> 1 (Intercept) 4.464230e+02 3.725603e+07 #> 2 wt 6.667981e-03 1.752195e-01 #> 3 am 8.923942e-03 3.747664e-01</span>

## Marginal effects for rstanarm-models

The ggeffects-package creates tidy data frames of model predictions, which are ready to use with ggplot (though there’s a `plot()`

-method as well). *ggeffects* supports a wide range of models, and makes it easy to plot marginal effects for specific predictors, includinmg interaction terms. In the past updates, support for more model types was added, for instance `polr`

(pkg MASS), `hurdle`

and `zeroinfl`

(pkg pscl), `betareg`

(pkg betareg), `truncreg`

(pkg truncreg), `coxph`

(pkg survival) and `stanreg`

(pkg rstanarm).

`ggpredict()`

is the main function that computes marginal effects. Predictions for `stanreg`

-models are based on the posterior distribution of the linear predictor (`posterior_linpred()`

), mostly for convenience reasons. It is recommended to use the posterior predictive distribution (`posterior_predict()`

) for inference and model checking, and you can do so using the `ppd`

-argument when calling `ggpredict()`

, however, especially for binomial or poisson models, it is harder (and much slower) to compute the „confidence intervals“. That’s why relying on `posterior_linpred()`

is the default for `stanreg`

-models with `ggpredict()`

.

Here is an example with two plots, one without raw data and one including data points:

library(sjmisc) library(rstanarm) library(ggeffects) data(efc) # make categorical efc$c161sex <- to_label(efc$c161sex) # fit model m <- stan_glm(neg_c_7 ~ c160age + c12hour + c161sex, data = efc) dat <- ggpredict(m, terms = c("c12hour", "c161sex")) dat <span style="color:#999999;">#> # A tibble: 128 x 5 #> x predicted conf.low conf.high group #> <dbl> <dbl> <dbl> <dbl> <fctr> #> 1 4 10.80864 10.32654 11.35832 Male #> 2 4 11.26104 10.89721 11.59076 Female #> 3 5 10.82645 10.34756 11.37489 Male #> 4 5 11.27963 10.91368 11.59938 Female #> 5 6 10.84480 10.36762 11.39147 Male #> 6 6 11.29786 10.93785 11.61687 Female #> 7 7 10.86374 10.38768 11.40973 Male #> 8 7 11.31656 10.96097 11.63308 Female #> 9 8 10.88204 10.38739 11.40548 Male #> 10 8 11.33522 10.98032 11.64661 Female #> # ... with 118 more rows</span> plot(dat) plot(dat, rawdata = TRUE)

As you can see, if you work with labelled data, the model-fitting functions from the *rstanarm*-package preserves all value and variable labels, making it easy to create annotated plots. The „confidence bands“ are actually hidh density intervals, computed with the above mentioned `hdi()`

-function.

## Next…

Next I will integrate *ggeffects* into my sjPlot-package, making sjPlot more generic and supporting more models types. Furthermore, *sjPlot* shall get a generic `plot_model()`

-function which will replace former single functions like `sjp.lm()`

, `sjp.glm()`

, `sjp.lmer()`

or `sjp.glmer()`

. `plot_model()`

should then produce a plot, either marginal effects, forest plots or interaction terms and so on, and accept (m)any model class. This should help making *sjPlot* more convenient to work with, more stable and easier to maintain…

Tagged: Bayes, data visualization, ggplot, R, rstanarm, rstats, sjPlot, Stan

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Strenge Jacke!**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.