# Survival Analysis With Generalized Additive Models : Part III (the baseline hazard)

May 2, 2015
By

(This article was first published on Statistical Reflections of a Medical Doctor » R, and kindly contributed to R-bloggers)

In the third part of the series on survival analysis with GAMs we will review the use of the baseline hazard estimates provided by this regression model. In contrast to the Cox mode, the log-baseline hazard is estimated along with other quantities (e.g. the log hazard ratios) by the Poisson GAM (PGAM) as:

$log(h(t_{i,j})) = \beta_{0}+\lambda(t_{t,j})+\boldsymbol{x\beta} = \lambda_{I}(t_{i,j})+\boldsymbol{x\beta}$

In the aforementioned expression, the baseline hazard is equivalently modeled as a time-varying deviation ($\lambda(t)$) from a constant (the intercept $\beta_{0}$) , or as a time-varying function ($\lambda_{I}(t)$). In the latter case, the constant is absorbed into the smooth term. The choice between these equivalent forms is dictated by the application at hand; in particular, the intercept may be switched on or off by centering the smooth terms appearing in the call to the gam function. Hence, in the PGAM formulation the log-baseline hazard is yet another covariate that one estimates by a smooth function; other covariates may modify this hazard in a proportional fashion by additively shifting the log-baseline hazard ($\boldsymbol{x\beta}$).

In the “standard” way of fitting a PGAM by mgcv, the log-baseline hazard is estimated in the constant+deviation form. Exponentiation may be used to derive the baseline hazard and its standard errors. Continuing the analysis of the Primary Biliary Cirrhosis example from the second part of the series, we may write:

par(mfrow=c(2,2))
plot(fGAM,main="Gauss Lobatto (5)",ylab="log-basehaz")
plot(fGAM2,main="Gauss Lobatto (10)",ylab="log-basehaz")
plot(fGAM,main="Gauss Lobatto (5)",ylab="basehaz",trans=exp)
plot(fGAM2,main="Gauss Lobatto (10)",ylab="basehaz",trans=exp)

Log Baseline (top row) and Baseline (second row) hazard function in the PBC dataset for two different discretizations of the data. In all these cases, the baseline hazard (or its log) are given as time varying deviations from a constant (the value of the log-hazard where the confidence interval vanishes)

There is no substantial difference in the estimated obtained by the coarse (Gauss Lobatto (5)) and finer (Gauss Lobatto (10)) discretization. Note that as a result of fitting the log-hazard as constant+ time-varying deviation, the standard error of the curve vanishes at ~1050: the value of the log-hazard at that instant in events per unit time is provided by the intercept term.

Estimation of the log-baseline hazard allows the PGAM to function as a parametric, smooth alternative to the Kaplan Meier estimator. This will be examined in the fourth part of this series.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...