Un-debunking the GAMLSS myth

Posted on December 11, 2025 by Achim Zeileis in R bloggers | 0 Comments

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Generalized additive models for location, scale, and shape (GAMLSS) are popular for estimating reference equations, e.g., for lung function measurements. A recent paper claimed to ‘debunk’ the GAMLSS myth in this context by showing that much simpler piecewise linear models work just as well. A new letter to the editor refutes these claims.

Citation

Robert A. Rigby, Mikis D. Stasinopoulos, Achim Zeileis, Sanja Stanojevic, Gillian Heller, Fernanda de Bastiani, Thomas Kneib, Andreas Mayr, Reto Stauffer, Nikolaus Umlauf (2025). “Letter to the Editor Refuting ‘Debunking the GAMLSS Myth: Simplicity Reigns in Pulmonary Function Diagnostics’.” Forthcoming in Respiratory Medicine. Preprint available at arXiv.org E-Print Archive arXiv:2512.09179 [stat.AP]. doi:10.48550/arXiv.2512.09179

Abstract

We read with interest the above article by Zavorsky (2025, Respiratory Medicine) concerning reference equations for pulmonary function testing. The author compares a Generalized Additive Model for Location, Scale, and Shape (GAMLSS), which is the standard adopted by the Global Lung Function Initiative (GLI), with a segmented linear regression (SLR) model, for pulmonary function variables. The author presents an interesting comparison; however there are some fundamental issues with the approach. We welcome this opportunity for discussion of the issues that it raises. The author’s contention is that (1) SLR provides “prediction accuracies on par with GAMLSS”; and (2) the GAMLSS model equations are “complicated and require supplementary spline tables”, whereas the SLR is “more straightforward, parsimonious, and accessible to a broader audience”. We respectfully disagree with both of these points.

Read full letter ›

Replication

Zavorsky provides the data for his analysis in NHANES_2007_2012_Only_Acceptable_Spirometry_Values.csv.

Based on Zavorsky’s code we were able to replicate most of his analyses and were able to re-assess the models. The replication code for the letter to the editor is available in undebunking.R.

For illustration one pair of models for one of the response variables (in the male subgroup) is fitted and assessed below.

Illustration

To illustrate the workflow we used for assessing the segmented linear regression (SLR) and the generalized additive models for location, scale, and shape (GAMLSS) for the spirometry data, we consider the response variable FEV1 for the male subgroup. FEV1 is the forced expiratory volume in 1 second in liters, i.e., the volume of air that can forcibly be blown out in the first second, after full inspiration.

After downloading the data (see link above) we can read the CSV, compute squared age (which Zavorsky uses somewhat confusingly instead of age in SLR), and select the male subset.

d <- "NHANES_2007_2012_Only_Acceptable_Spirometry_Values.csv" |>
  read.csv() |>
  transform(Age2 = Age^2) |>
  subset(Sex == "Male")

Then we load the packages we need, all available from CRAN except the topmodels package which is still on R-Forge.

library("segmented")
library("gamlss")
library("distributions3")
library("topmodels")

To enable the workflow for probabilistic forecasts and model assessments from the distributions3 and topmodels packages for the SLR models fitted by segmented we need a prodist() method to extract/predict the corresponding probability distribution. This does not work out of the box because segmented employs just a single constant variance while Zavorsky employs a piecewise constant variance. The function below implements this for the special case of a segmented lm() with exactly one breakpoint.

prodist.segmented <- function(object, newdata = NULL, ...) {
  ## in-sample and out-of-sample data
  fitdata <- model.frame(object)
  if (is.null(newdata)) newdata <- fitdata

  ## predicted mean (m) and standard deviation (s)
  m <- predict(object, newdata = newdata)
  psi <- object$psi[, "Est."]
  seg <- as.numeric(fitdata[[object$nameUV$Z]] > psi) + 1
  s <- sqrt(tapply(residuals(object)^2, seg, mean))
  s <- s[as.numeric(newdata[[object$nameUV$Z]] > psi) + 1]

  ## Normal distribution object
  distributions3::Normal(mu = m, sigma = s)
}

Then we fit the SLR and GAMLSS models in the (somewhat complicated) specification that Zavorsky considered for his paper.

lm_fev1_m <- lm(Baseline_FEV1_L ~ Age2 + Height + I(Height^2) + Weight, data = d)
seg_fev1_m <- segmented(lm_fev1_m, seg.Z = ~ Age2, psi = 20^2)

gam_fev1_m <- gamlss(Baseline_FEV1_L ~ log(Height) + log(Age) + cs(Age, df = 3),
  sigma.formula = ~ log(Age) + cs(Age, df = 3), family = BCCGo(mu.link = "log"), 
  data = d, method = mixed(50, 500),
  control = gamlss.control(trace = TRUE, maxit = 500))

To show that the SLR with the piecewise constant variance does not provide a satisfactory fit for several age groups we provide the visualization below. It depicts the empirical proportions of observations below the 5% lower limit of normal (LLN) across different age groups (7.5 year intervals). The red line with triangles corresponds to the SLR model and the blue line with circles corresponds to the GAMLSS. The corresponding point-wise 95% intervals are added in gray in each panel.

While the proportion of observations below the LLN roughly corresponds to the nominal 5% level for the GAMLSS in all age intervals, the SLR does rather poorly. For young persons (5 to 12.5 years of age) the LLN is much too small, so that very few observations actually fall below the LLN. Conversely, for persons between 12.5 and 20 years of age the LLN is too large and hence too many observations fall below the LLN. In other words, the SLR would fail to indicate critically low FEV1 values, either by producing too many or too few cases.

Using the procast() function from topmodels we predict for each person the corresponding 5% LLN and then we aggregate the exceeding proportions in 7.5 year age intervals.

## proportion of observations below LLN
d_lln <- rbind(
  data.frame(
    proportion = d$Baseline_FEV1_L < procast(gam_fev1_m, type = "quantile", at = 0.05, drop = TRUE),
    fit = "gamlss",
    age = d$Age),
  data.frame(
    proportion = d$Baseline_FEV1_L < procast(seg_fev1_m, type = "quantile", at = 0.05, drop = TRUE),
    fit = "segmented",
    age = d$Age)
)

## set up age intervals
breaks <- seq(5, 80, by = 7.5)
mid <- (head(breaks, -1) + tail(breaks, -1))/2
d_lln <- transform(d_lln,
  age = cut(age, breaks, include.lowest = TRUE)
)

## aggregate in age groups
a_lln <- d_lln |>
  aggregate(x = proportion ~ fit + age, FUN = mean) |>
  transform(nobs = aggregate(proportion ~ fit + age, data = d_lln, FUN = length)$proportion) |>
  transform(alpha = 0.05) |>
  transform(se = sqrt(alpha * (1 - alpha)/nobs)) |>
  transform(lower = alpha + qnorm(0.025) * se) |>
  transform(upper = alpha + qnorm(0.975) * se) |>
  transform(age = mid[age])

To display these aggregated values in an appealing display we leverage the recent tinyplot package.

library("tinyplot")
pal <- palette.colors()[6:7]
tinytheme("clean2", grid.lwd = 1, grid.lty = 2)
tinyplot(
  proportion ~ age | fit,
  data = a_lln,
  type = "o",
  pch = c(19, 17),
  lwd = 1.5,
  cex = 1.5,
  col = pal,
  ylim = c(0.01, 0.09),
  yaxl = "%",
  xlab = "Age (mid points of 7.5 year intervals)",
  ylab = "Proportion of observations below LLN"
)
tinyplot_add(
  alpha ~ age | fit,
  ymin = a_lln$lower,
  ymax = a_lln$upper,
  type = "ribbon",
  col = "gray"
)
tinyplot_add()

To bring out clearly that not only the 5% LLN is misspecified but that more generally the SLR provides a poor probabilistic fit for FEV1, we employ a detrended QQ plot (also known as worm plot) based on z-scores (also known as quantile residuals). Using the qqrplot() function from topmodels this can be set up as follow.

qqrplot(seg_fev1_m, detrend = TRUE, confint = "polygon",
  col = pal[2], pch = 17, main = "")
qqrplot(gam_fev1_m, detrend = TRUE, plot = FALSE) |>
  points(col = pal[1], pch = 19)

In the letter the QQ plots are used without detrending and are considered separately for different age groups.

See the full replication code (linked above) for further details.

To leave a comment for the author, please follow the link and comment on their blog: Achim Zeileis.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Un-debunking the GAMLSS myth

Citation

Abstract

Replication

Illustration

Related

Citation

Abstract

Replication

Illustration

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)