Bayesian Neural Networks in {tidymodels} with {kindling}

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post was written in collaboration with Joshua Marie.

What Are Bayesian Neural Networks?

Standard neural networks learn fixed weights during training and produce a single point estimate for each input — with no sense of how confident the model is. Bayesian Neural Networks (BNNs) replace those fixed weights with probability distributions (Neal 2012). Instead of learning a single value per weight, the network learns a mean and a variance, and samples from that distribution at every forward pass.

Formally, a BNN places a prior \(p(\mathbf{w})\) over the weights and updates it with observed data \(\mathcal{D}\) via Bayes’ theorem:

\[p(\mathbf{w} \mid \mathcal{D}) = \frac{p(\mathcal{D} \mid \mathbf{w})\, p(\mathbf{w})}{p(\mathcal{D})}\]

For large and deep networks, the posterior \(p(\mathbf{w} \mid \mathcal{D})\) is intractable. BNNs typically use variational inference to approximate it with a tractable distribution \(q_\phi(\mathbf{w})\) by minimising the KL divergence between the two (Blundell et al. 2015).

From point estimates to distributions

A standard linear layer computes \(\mathbf{y} = \mathbf{W}\mathbf{x} + \mathbf{b}\), where \(\mathbf{W}\) and \(\mathbf{b}\) are fixed. In a Bayesian linear layer, they are random variables. We place a Gaussian prior over the weights:

\[\mathbf{W} \sim \mathcal{N}(\mu_{\text{prior}},\ \sigma_{\text{prior}}^2)\]

and during training we learn a variational posterior:

\[q(\mathbf{W}) = \mathcal{N}(\mu_W,\ \sigma_W^2)\]

Each weight therefore has two learnable scalars: a mean \(\mu_W\) and a log-standard-deviation \(\log \sigma_W\).

The reparameterisation trick

Sampling directly from \(q(\mathbf{W})\) is not differentiable with respect to \(\mu_W\) and \(\sigma_W\). We resolve this with the reparameterisation trick (Kingma and Welling 2013):

\[\mathbf{W} = \mu_W + \sigma_W \odot \varepsilon, \quad \varepsilon \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\]

We store \(\log \sigma_W\) (not \(\sigma_W\) directly) to ensure positivity during unconstrained optimisation:

\[\sigma_W = \exp(\texttt{weight}_{\log\sigma})\]

This yields the sampled weight used in the forward pass:

\[\mathbf{W}_{\text{sample}} = \mu_W + \exp(\texttt{weight}_{\log\sigma}) \odot \varepsilon_W\]

and similarly for the bias:

\[\mathbf{b}_{\text{sample}} = \mu_b + \exp(\texttt{bias}_{\log\sigma}) \odot \varepsilon_b\]

Is it possible?

At the moment, {tidymodels} offers limited APIs for neural network architectures, mostly around standard MLPs. A few packages make {torch} easier to use at a higher level:

  1. {brulee} bridges {torch} with {tidymodels}. It is ergonomic, but currently focused on MLPs for tabular data.
  2. {cito} does not integrate with {tidymodels} and leans more toward statistical applications.
  3. The {torch} team maintains {luz}.
  4. {tabnet} integrates with {tidymodels}, but is dedicated to the TabNet architecture.

{kindling} does not replace these packages. However, it helps close some of these gaps by supporting custom architectures with flexible depth. As a result, custom neural network architectures, including BNNs, can be used within {tidymodels} workflows. For a broader overview of what {kindling} enables in R, see this earlier post.

Setup

On March 3, 2026, {kindling} v0.3.0 was released on CRAN. Remember that utils::install.packages() in R always installs the latest version.

install.packages("kindling")

If you want the development version, you can retrieve the package by installing it from GitHub:

pak::pak("joshuamarie/kindling")

I recommend using {pak}: it is fast and very good at resolving package dependencies across environments, including system libraries and {renv}-managed projects.

Before using {kindling}, install LibTorch — the C++ backend shared by PyTorch and the {torch} R package (Falbel and Luraschi 2023):

torch::install_torch()

Loading namespace

When teaching R, I recommend using box::use() and qualifying the names you import. Create a bnn folder in your project root, then clone this repository to use BNNs in R (this module is adapted from torchbnn (Kim 2020)): https://github.com/joshuamarie/RTorchBNN.

You can run a specific command:

git clone --branch main https://github.com/joshuamarie/RTorchBNN bnn

Then, load the following:

box::use(
  bnn = . / bnn,
  kindling[train_nnsnip, nn_arch, act_funs],
  parsnip[fit, augment],
  yardstick[metrics]
)

Then register the models in {parsnip} with:

loadNamespace("kindling")

The train_nnsnip() Interface

In {kindling} v0.3.0, train_nnsnip() was introduced as a {parsnip}-compatible model specification that bridges the generalized neural network trainer with the {tidymodels} ecosystem. Unlike mlp_kindling(), which is specific to feedforward networks, train_nnsnip() is architecture-agnostic: you describe the layer topology via nn_arch(), allowing BNN-style or other custom layer types to slot in without changing the training logic.

Classification: Iris

We train a Bayesian Neural Network model to predict species in the iris dataset via train_nnsnip(), by configuring arch with nn_arch(). Inside nn_arch(), set nn_layer to bnn$BayesLinear instead of the default linear layer, then define the arguments for each layer with layer_arg_fn.

nn_model <-
  train_nnsnip(
    mode = "classification",
    arch = nn_arch(
      nn_layer = bnn$BayesLinear,
      out_nn_layer = torch::nn_linear,
      layer_arg_fn = ~ if (.is_output) {
        list(.in, .out)
      } else {
        list(
          in_features = .in,
          out_features = .out,
          prior_mu = 0,
          prior_sigma = 0.1
        )
      }
    ),
    hidden_neurons = c(64, 32),
    activations = act_funs(relu, elu),
    loss = "cross_entropy",
    epochs = 50,
    verbose = TRUE,
    learn_rate = 1e-3,
    optimizer_args = list(weight_decay = 0.01)
  ) |>
  fit(Species ~ ., data = iris)

nn_model |>
  augment(new_data = iris) |>
  metrics(truth = Species, estimate = .pred_class)
## # A tibble: 2 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy multiclass     0.667
## 2 kap      multiclass     0.5

Averaging predictions over multiple forward passes gives both a point estimate and a measure of predictive spread — all within a standard {tidymodels} pipeline.

The drawbacks

For now, there is no built-in option to update the loss dynamically during training. To use a custom objective such as the Evidence Lower Bound (ELBO), you currently need to define it manually and retrain with that loss.

Updating the model

That said, you can still apply a custom loss by setting the loss parameter to a torch function, for example torch::nnf_mse_loss(). If you want to train the model with an ELBO-style loss, do the following:

make_elbo_loss <- function(bnn_model, n_obs, kl_weight = 1.0) {
  box::use(
    torch[nnf_cross_entropy, torch_tensor]
  )

  function(input, target) {
    ce <- nnf_cross_entropy(input, target)
    device <- input$device
    kl_val <- torch_tensor(0.0, device = device, requires_grad = FALSE)
    for (m in bnn_model$modules) {
      if (inherits(m, "BayesLinear") && !is.null(m$kl)) {
        kl_val <- kl_val + m$kl
      }
    }

    ce + kl_weight * kl_val / n_obs
  }
}

elbo_loss <- make_elbo_loss(nn_model$fit$model, n_obs = nrow(iris), kl_weight = 1.0)

nn_model2 <-
  train_nnsnip(
    mode = "classification",
    arch = nn_arch(
      nn_layer = bnn$BayesLinear,
      out_nn_layer = torch::nn_linear,
      layer_arg_fn = ~ if (.is_output) {
        list(.in, .out)
      } else {
        list(
          in_features = .in,
          out_features = .out,
          prior_mu = 0,
          prior_sigma = 0.1
        )
      }
    ),
    hidden_neurons = c(64, 32),
    activations = act_funs(relu, elu),
    loss = elbo_loss,
    epochs = 50,
    verbose = TRUE,
    learn_rate = 1e-3,
    optimizer_args = list(weight_decay = 0.01)
  ) |>
  fit(Species ~ ., data = iris)

nn_model2 |>
  augment(new_data = iris) |>
  metrics(truth = Species, estimate = .pred_class)
## # A tibble: 2 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy multiclass     0.847
## 2 kap      multiclass     0.77

As always, if you have any question related to the topic covered in this post, please add it as a comment so other readers can benefit from the discussion.

References

Blundell, Charles, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. “Weight Uncertainty in Neural Network.” International Conference on Machine Learning, 1613–22.
Falbel, Daniel, and Javier Luraschi. 2023. Torch: Tensors and Neural Networks with GPU Acceleration. https://github.com/mlverse/torch.
Kim, Harry. 2020. Bayesian Neural Network for PyTorch [Torchbnn]. https://github.com/Harry24k/bayesian-neural-network-pytorch.
Kingma, Diederik P, and Max Welling. 2013. “Auto-Encoding Variational Bayes.” arXiv Preprint arXiv:1312.6114.
Neal, Radford M. 2012. Bayesian Learning for Neural Networks. Vol. 118. Springer Science & Business Media.
To leave a comment for the author, please follow the link and comment on their blog: R on Stats and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)