Stress testing

[This article was first published on Gianluca Baio's blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Lately, we’ve been spending a lot of time “stress-testing” our method for the computation of the Expected Value of Partial Perfect Information (EVPPI $-$ I know: the terminology is a bit strange and possibly not-very helpful, as “perfect” information doesn’t really exist, in statistical terms…).

I have mentioned this already here, here and here and the idea is to combine results from spatial statistics (and INLA) and Gaussian Process regression into the health economic problem. 

In a nutshell (I’ll avoid all the technical details here), the “data” for our model consist on a vector of values $mbox{NB}(theta)$ (the “monetary net benefit“, which determines the utility of a given health intervention) and a matrix of simulations from the (joint posterior) distributions of a set of relevant model parameters. The idea is that the multivariate parameter set can be split in a subset of “parameters of interest” ($phi$), while the rest ($psi$) are sort of “nuisance” or “unimportant” parameters. 

Following up on the work by Mark Strong et al, the relevant model can be written as
$$ mbox{NB}(theta) = g(phi) + varepsilon $$and the objective is to estimate the function $g(phi)$, which is then used to compute the EVPPI. This saves up a huge computation time, in comparison to other methods.

In our model, we extended this framework and modelled
$$ mbox{NB}(theta) = Hbeta + omega + varepsilon$$where $Hbeta$ is a linear component depending on the simulations for all the “important” parameters, while $omega$ is a spatially structured component, which accounts for the correlation among the important parameters. The big advantage is that using this formulation we are able to make inference based on INLA/SPDE, which is super-fast and can save up a lot of time even in comparison with the already fast “standard” Gaussian Process regression model.

Our first tests were giving very good results (as reported here). Then we’ve used a more complex model and found that while still being faster, our method was losing in accuracy for some specific parameters. This was a bummer, but it also meant that we had to go back and try and understand a bit better what was going on.

I’ll make this sound very easy (when in fact Anna has spent a lot of time on this $-$ and wishing she never met me and started her PhD on this, I’m sure!), but eventually we figured out what the problem was. Firstly, in very complex situations (which are not that uncommon in real health economic evaluation problems), there may be quite a large correlation and non-linearity in the relationships among the parameters in $phi$. This means that the combination of the standard linear predictor and the spatially structured effect cannot model properly the observed data, resulting in lower accuracy in the estimations. But, interestingly, extending $Hbeta$ to include interactions among the relevant parameters can fix this problem, with only a small increase in computational cost.

Secondly (this is a bit more technical, but also quite interesting), the spatially structured component is based on constructing a mesh which describes the relationship between the parameters in a Euclidean space. If the “boundaries” of the resulting mesh are too close to the range of the observed points, then the estimation procedure will return several predictions at 0 $-$ in a rather vague sense, something like: the boundaries are areas where the estimated smooth curve is 0. But this means that in the computation of the EVPPI there is an artificially large number of 0 values, which produces an under-estimation of the “true” value.

We have fixed this by modifying the evppi function in the development version of BCEA and now the user can specify a non-linear part as well as fiddle with the INLA-specific parameters defining the mesh for the spatially structured component. The results seem to be much more accurate, still with some substantial computational savings. I’ll try to put some R script to help people test the function (although I’ve also modified the help for evppi) to guide through the example involving the (simpler) Vaccine model

To leave a comment for the author, please follow the link and comment on their blog: Gianluca Baio's blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)