On top of recommending the excellent autobiography of Stanislaw Ulam, this post is about using the software Stan, but not directly to perform inference, instead to obtain R functions to evaluate a target’s probability density function and its gradient. With which, one can implement custom methods, while still benefiting from the great work of the Stan team on the “modeling language” side. As a proof of concept I have implemented a plain Hamiltonian Monte Carlo sampler for a random effect logistic regression model (taken from a course on Multilevel Models by Germán Rodríguez), a coupling of that HMC algorithm (as in “Unbiased Hamiltonian Monte Carlo with couplings“, see also this very recent article on the topic of coupling HMC), and then upper bounds on the total variation distance between the chain and its limiting distribution, as in “Estimating Convergence of Markov chains with L-Lag Couplings“.
The R script is here: https://github.com/pierrejacob/statisfaction-code/blob/master/2019-09-stan-logistic.R and is meant to be as simple as possible, and self-contained; warning, this is all really proof of concept and not thoroughly tested.
Basically the R script starts like a standard script that would use rstan for inference; it runs the default algorithm of Stan for a little while, then extracts some info from the “stanfit” object. With these, a pure R implementation of TV upper bounds for a naive HMC algorithm follows, that relies on functions called “stan_logtarget” and “stan_gradlogtarget” to evaluate the target log-pdf and its gradient.
The script takes a few minutes to run in total. Some time is first needed to compile the Stan code, and to run Stan for a few steps. Then some time spent towards the end of the script to generate 250 independent meeting times with a lag of 500 between the chains; the exact run time will of course depend a lot on your number of available processors (on my machine it takes around one minute). The script produces this plot:
This plot suggests that vanilla HMC as implemented in the script converges in less than 1000 iterations to its stationary distribution. This is probably quite conservative, but it’s still usable.
In passing, upon profiling the code of the function that generates each meeting time, it appears that half of the time is spent in Stan‘s “grad_log_prob” function (which computes the gradient of the log pdf of the target). This implies that not that much efficiency is lost in the fact that the algorithms are coded in pure R, at least for this model.