**R – Statistical Odds & Ends**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m proud to announce that my latest research project, * reluctant generalized additive modeling (RGAM)*, is complete (for now)! In this post, I give a brief overview of the method: what it is trying to do and how you can fit such a model in R. (This project is joint work with my advisor, Rob Tibshirani.)

- For an in-depth description of the method, please see our arXiv preprint.
- You can download the CRAN version of the package,
`relgam`

, here. The latest version of the package is on Github. - For more details on how to use the package, please see the package’s vignette.

**Introduction and motivation**

**tl;dr: Reluctant generalized additive modeling (RGAM) produces highly interpretable sparse models which allow non-linear relationships between the response and each individual feature. However, non-linear relationships are only included if deemed important in improving prediction performance. RGAMs working with quantitative, binary, count and survival responses and is computationally efficient.**

Consider the supervised learning setting, where we have observations of features for and , along with n responses . Let denote the values of the th feature. * Generalized linear models (GLMs)* assume that the relationship between the response and the features is

where is a link function and is a mean-zero error term. * Generalized additive models (GAMs)* are a more flexible class of models, assuming the true relationship to be

where the ‘s are unknown functions to be determined by the model.

These two classes of models include all features in the model which is often undesirable, especially when we have tons of features. (We usually expect only a small fraction of features to have any influence on the response variable.) This is especially problematic with GAMs as overfitting can occur much more easily. A host of methods have arisen to create *sparse* GAMs, i.e. GAMs that involve only a handful of features. Earlier examples of such examples include * COSSO* (Lin & Zhang 2006) and

*(Ravikumar et al. 2007).*

**SpAM**While providing sparsity, these methods dictated that the features included had to have a non-linear relationship with the response even if a linear relationship would have been sufficient to capture the relationship. More sophisticated methods were developed to give both sparsity and the possibility of linear or non-linear relationships between the features and response. Examples of such methods are * GAMSEL* (Chouldechova & Hastie 2015),

*(Lou et al. 2016) and*

**SPLAM***(Petersen & Witten 2019). GAMSEL is available on R in the*

**SPLAT**`gamsel`

package (see my unofficial vignette here) and I was not able to find R packages for the other two methods.* Reluctant generalized additive models (RGAM)* fall in the same class as these last group of methods. It is available on R in the

`relgam`

package. RGAMs are computationally fast and work with quantitative, binary, count and survival response variables. (To my knowledge, existing software only works for quantitative and binary variables.)**RGAM: What is it?**

Reluctant generalizing additive modeling was inspired by * reluctant interaction modeling* (Yu et al. 2019). The idea is that

One should prefer a linear term over a non-linear term if all else is equal.

That is, we prefer a model to contain only effects that are linear in the original set of features: non-linearities are only included thereafter if they add to predictive performance.

We operationalize this principle with a three-step process that closely mimics that of reluctant interaction modeling. At a high level:

- Fit the response as well as we can using only the main effects (i.e. original features).
- For each original feature , construct a non-linear feature associated with it.
- Refit the response on all the main effects and the additional features from Step 2.

Now for a little bit more detail:

- Fit the lasso of on to get coefficients . Compute the residuals , using the hyperparameter chosen by cross-validation.
- For each , fit a smoothing spline with degrees of freedom of on which we denote by . Rescale so that . Let denote the matrix whose columns are the ‘s.
- Fit the lasso of on and for a path of tuning parameters

There are three hyperparameters here: (just like the lasso), for the smoothing spline degrees of freedom in Step 2, and for scaling the non-linear features. The role of might be a bit hard to understand from the technical description above. Informally, means that the linear and non-linear features are on the same scale. means that the non-linear features are on a smaller scale: as a result, the coefficient associated with them is less likely to survive variable selection by the lasso in Step 3.

**A simple example**

The CRAN vignette is the best place to start learning how to fit RGAMs in practice. Below I give an example of the types of models that can come out of RGAM. (Code for this example can be found here.)

We simulate data with observations and features. Each entry in the matrix is an independent draw from the standard normal distribution, and the true response is

We fit a RGAM to this data for a sequence of values. The larger the index, the smaller the value, the less penalty imposed in the lasso in Step 3, resulting in more flexible models.

For each value, RGAM’s predictions have the form

Let . We plot the model fits for the first 30 values in the animation below. In each of the 12 panes, we see the estimated for each variable (in blue, green or red), and the true relationship in black.

For small indices (i.e. large values), we have very restricted models, with most ‘s being zero or linear. As the index increases, we see that the RGAM model fits get closer and closer to the true relationships. Past some index, we start to see some overfitting going on. The optimal value of can be chosen via methods like cross-validation.

**Give it a try!**

I think RGAM is a neat extension to GAM and other sparse additive models. It may not always perform best but I think it is a nice tool to add to your arsenal of interpretable models to try for supervised learning problems!

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Statistical Odds & Ends**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.