While the Covid-19 pandemic and teaching a new course in the fall put paid to most of my development time last year, some time off work this January allowed me time to work on gratia 📦 again. I released 0.5.0 to CRAN in part to fix an issue with tests not running on the new M1 chips from Apple because I wasn’t using vdiffr 📦 conditionally. Version 0.5.1 followed shortly thereafter as I’d messed up an argument name in
smooth_estimates(), a new function that I hope will allow development to proceed more quickly and mke it easier to maintain code and extend functionality to cover a greater range of model types. Read on to fin out more about
smooth_estimates() and what else was in these two releases.
Evaluating smooths with
For a while now I’ve realised that the way I’d implemented
evaluate_smooth() wasn’t great. Some design decisions I took earlier on added a lot of unnecessary complexity to the function through handling of factor
by smooths, and which didn’t really work properly in the context of a GAM where the same variable could be in multiple smooth terms.
My original plan was to use a facetted plot for factor
by variable smooths, and so when you selected a model term (more on that later), if that term was a factor
by smooth, instead of just pulling in a single smooth, I would pull in all of the smooths associated with the factor
by. Handling this got complicated and resulted in some kludgy, messy code that was prone to failure when used with a more specialised smooth or a more complex model.
Additionally, how I initially implemented selection of model terms was a bit silly; a user could pass a string for a variable that would be matched against the labels that mgcv 📦 uses for smooth. Any instance of the term in any smooth would then get selected, which is not usually what is wanted when working with complex models with multiple smooths, some of which might contin the same variable.
Because of this, in the summer I decided to completely rewrite
evaluate_smooth(). Then I realised this would not be a good idea as I was going to break a lot of existing code, including code we’d written in support of papers that had been published and which used
evaluate_smooth(). Instead, I decided to start from a clean slate with a new function that didn’t do any of the silly things I’d messed up
evaluate_smooth() with, and which would be much simpler to maintain and develop for a wider range of complex distributional models.
smooth_estimates() I also came up with a standard way to represent all evaluations of a smooth, regardless of type. The nice thing about this is that it’s easy to return a tibble containing all the values of the evaluated smooth for many smooths at once, something you couldn’t do with
The idea behind
smooth_estimates() is to return a tibble of values of the smooth evaluated at a grid of
n points over each of the covariates involved in that smooth.
This seems a little wasteful — all those
NA columns 😱 — but the output is a consistent wa to represent smooths, regardless of the number of covariates etc.
I’m toying with returning the tibble in a nested fashion with
nest(), something like
which I think is much neater, but does require extra steps from the user to just use the output
Internally, the individual smooths are nested by default as that makes it easy to join the tibbles for multiple smooth together. As such, the unnested-ness of the current behaviour requires an explicit extra step within
If you have thoughts about this, let me know in the comments below.
smooth_estimates() is going to supersede
evaluate_smooth(), and currently it can handle pretty much everything that
evaluate_smooth() can do. That doesn’t mean
evaluate_smooth() is going anywhere; as I mentioned above, I don’t want to break old code, so as log as it doesn’t take too much time to maintain
evaluate_smooth() isn’t hurting anyone if I put it out to pasture.
Version 0.5.0 introduced
smooth_estimates() which could only handle very simple univariate smooths, but version 0.5.1 expanded those capabilities. There are a few special smooths that I haven’t yet added capabilities for, including Markov random field smooths and soap film smooths. Support for those will be added by the time version 0.6.0 hits CRAN later this year.
Version 0.4.0 introduced the ability to add partial residuals to plots of smooths. Version 0.5.0 exposes this functionality for computing partial residuals via new function
The names are currently non-standard — hence all the backticks — and I might change that if I can think of a short hand way to refer to smooths that still allows referencing them uniquely when there are things like factor
by smooths involved.
I also added an
add_partial_residuals(), to add the partial residuals to an existing data frame
but since implementing this I am now questioning whether this is a good thing or rather whether the implementation is a good thing; there’s nothing in the code currently to ensure that the data you provided matches the order of the data used to fit the model — caveat emptor!
I’ve been adding functions to gratia that will be helpful when teaching GAMs; I added
basis() a while back and in the 0.5.1 release I added
penalty(), for extracting and tidying penalty matrices of smooths from fitted GAM models.
There is a
draw() method also, to plot the penalty matrix
It was pointed out that the way this is plotted is not very intuitive if you’re trying to map the way the penalty matrix is written to what’s shown in the plot — you have to flip the y-axis. This is due to how
geom_raster() draws things. I have fixed this, but it’s only fixed in the GitHub version of the package, not a current release version.
draw.gam() and some related
draw() methods now allow you to configure the colour scales used to plot GAMs. Available options include
continuous_fill, that take a suitable scale allowing you to change the colour scheme used etc:
draw.gam() can now plot smooths after addition of a constant and transformation via a function. This can be used to put smooths (sort of) on the response scale. For example, in the code below, I add the model intercept to each smooth when plotting
I plan to add an argument
response, which would take a logical to indicate if you wan to plot on the response scale. If
response = TRUE, it would override anything passed to
fun, such that
draw.gam() would just do the right thing, and figure out from the model what constant and inverse link function to use. Watch out for that in 0.6.0.
Excluding or selecting terms to include in model predictions
predict.gam() allows the user to either exclude or specifically include only selected terms in model predictions. Version 0.5.0 added the same functionality in
predicted_samples(), by allowing you to pass along an
terms argument to
predict.gam() that is used in both of these functions.
All in all, these are not major changes to the functionality of gratia, but the ground work laid in
smooth_estimates() should allow me to address lots of the outstanding bugs related to handling complex model and some complex smooth types, and I’m pretty excited about that.