[This article was first published on Statistical Reflections of a Medical Doctor » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the fifth part of this series we will examine the capabilities of Poisson GAMs to stratify the baseline hazard for survival analysis. In a stratified Cox model, the baseline hazard is not the same for all individuals in the study. Rather, it is assumed that the baseline hazard may differ between members of groups, even though it will be the same for members of the same group.

Stratification is one of the ways that one may address the violation of the proportionality assumption for a categorical covariate in the Cox model. The stratified Cox model resolves the overall hazard in the study as:

$h_{g}(t,X) = h_{0_{g}}(t)exp(\boldsymbol{x\beta}) ,\quad g=1,2,\dotsc ,g_{K}$

In the logarithmic scale, the multiplicative model for the stratified baseline hazard becomes an additive one. In particular, the specification of a different baseline hazard for the different levels of a factor amounts to specifying an interaction between the factor and the smooth baseline hazard in the PGAM.

We turn to the PBC dataset to provide an example of a stratified analysis with either the Cox model or the PGAM. In that dataset the covariate edema is a categorical variable assuming the values of 0 (no edema), 0.5 (untreated or successfully treated) and 1(edema despite treatment). An analysis of the Schoenfeld residual test shows that this covariate violates the proportionality assumption

> f<-coxph(Surv(time,status)~trt+age+sex+factor(edema),data=pbc)
> Schoen<-cox.zph(f)
> Schoen
rho chisq p
trt -0.089207 1.12e+00 0.2892
age -0.000198 4.72e-06 0.9983
sexf -0.075377 7.24e-01 0.3950
factor(edema)0.5 -0.202522 5.39e+00 0.0203
factor(edema)1 -0.132244 1.93e+00 0.1651
GLOBAL NA 8.31e+00 0.1400
> 

To fit a stratified GAM model, we should transform the dataset to include additional variables, one for each level of the edema covariate. To make the PGAM directly comparable to the stratified Cox model, we have to fit the former without an intercept term. This requires that we include additional dummy variables for any categorical covariates that we would to adjust our model for. In this particular case, the only other additional covariate is the female gender:

pbcGAM<-transform(pbcGAM,edema0=as.numeric(edema==0),
edema05=as.numeric(edema==0.5),edema1=as.numeric(edema==1),
sexf=as.numeric(sex=="f"))

Then the stratifed Cox and PGAM models are fit as:

fGAM<-gam(gam.ev~s(stop,bs="cr",by=edema0)+s(stop,bs="cr",by=edema05)+
s(stop,bs="cr",by=edema1)+trt+age+sexf+offset(log(gam.dur))-1,
data=pbcGAM,family="poisson",scale=1,method="REML")

fs<-coxph(Surv(time,status)~trt+age+sex+strata(edema),data=pbc)



In general the values of covariates of the stratified Cox and the PGAM models are similar with the exception of the trt variable. However the standard error of this variable estimated by either model is so large, that the estimates are statistically no different from zero, despite their difference in magnitude

> fs
Call:
coxph(formula = Surv(time, status) ~ trt + age + sex + strata(edema),
data = pbc)

coef exp(coef) se(coef) z p
trt 0.0336 1.034 0.18724 0.18 0.86000
age 0.0322 1.033 0.00923 3.49 0.00048
sexf -0.3067 0.736 0.24314 -1.26 0.21000

Likelihood ratio test=15.8 on 3 df, p=0.00126 n= 312, number of events= 125
(106 observations deleted due to missingness)
> summary(fGAM)

Family: poisson

Formula:
gam.ev ~ s(stop, bs = "cr", by = edema0) + s(stop, bs = "cr",
by = edema05) + s(stop, bs = "cr", by = edema1) + trt + age +
sexf + offset(log(gam.dur)) - 1

Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
trt 0.002396 0.187104 0.013 0.989782
age 0.033280 0.009170 3.629 0.000284 ***
sexf -0.297481 0.240578 -1.237 0.216262
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(stop):edema0 2.001 2.003 242.0 <2e-16 ***
s(stop):edema05 2.001 2.001 166.3 <2e-16 ***
s(stop):edema1 2.000 2.001 124.4 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) = -0.146 Deviance explained = -78.4%
REML score = 843.96 Scale est. = 1 n = 3120