# Survival Analysis With Generalized Additive Models: Part V (stratified baseline hazards)

**Statistical Reflections of a Medical Doctor » R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the fifth part of this series we will examine the capabilities of Poisson GAMs to stratify the baseline hazard for survival analysis. In a stratified Cox model, the baseline hazard is not the same for all individuals in the study. Rather, it is assumed that the baseline hazard may differ *between members of groups*, even though it will be the same for members of the same group.

Stratification is one of the ways that one may address the violation of the proportionality assumption for a *categorical* covariate in the Cox model. The stratified Cox model resolves the overall hazard in the study as:

In the logarithmic scale, the multiplicative model for the stratified baseline hazard becomes an additive one. In particular, the specification of a different baseline hazard for the different levels of a factor amounts to specifying an interaction between the factor and the smooth baseline hazard in the PGAM.

We turn to the PBC dataset to provide an example of a stratified analysis with either the Cox model or the PGAM. In that dataset the covariate edema is a categorical variable assuming the values of 0 (no edema), 0.5 (untreated or successfully treated) and 1(edema despite treatment). An analysis of the Schoenfeld residual test shows that this covariate violates the proportionality assumption

> f<-coxph(Surv(time,status)~trt+age+sex+factor(edema),data=pbc) > Schoen<-cox.zph(f) > Schoen rho chisq p trt -0.089207 1.12e+00 0.2892 age -0.000198 4.72e-06 0.9983 sexf -0.075377 7.24e-01 0.3950 factor(edema)0.5 -0.202522 5.39e+00 0.0203 factor(edema)1 -0.132244 1.93e+00 0.1651 GLOBAL NA 8.31e+00 0.1400 >

To fit a stratified GAM model, we should transform the dataset to include additional variables, one for each level of the edema covariate. To make the PGAM directly comparable to the stratified Cox model, we have to fit the former without an intercept term. This requires that we include additional dummy variables for any categorical covariates that we would to adjust our model for. In this particular case, the only other additional covariate is the female gender:

pbcGAM<-transform(pbcGAM,edema0=as.numeric(edema==0), edema05=as.numeric(edema==0.5),edema1=as.numeric(edema==1), sexf=as.numeric(sex=="f"))

Then the stratifed Cox and PGAM models are fit as:

fGAM<-gam(gam.ev~s(stop,bs="cr",by=edema0)+s(stop,bs="cr",by=edema05)+ s(stop,bs="cr",by=edema1)+trt+age+sexf+offset(log(gam.dur))-1, data=pbcGAM,family="poisson",scale=1,method="REML") fs<-coxph(Surv(time,status)~trt+age+sex+strata(edema),data=pbc)

In general the values of covariates of the stratified Cox and the PGAM models are similar with the exception of the *trt* variable. However the standard error of this variable estimated by either model is so large, that the estimates are statistically no different from zero, despite their difference in magnitude

> fs Call: coxph(formula = Surv(time, status) ~ trt + age + sex + strata(edema), data = pbc) coef exp(coef) se(coef) z p trt 0.0336 1.034 0.18724 0.18 0.86000 age 0.0322 1.033 0.00923 3.49 0.00048 sexf -0.3067 0.736 0.24314 -1.26 0.21000 Likelihood ratio test=15.8 on 3 df, p=0.00126 n= 312, number of events= 125 (106 observations deleted due to missingness) > summary(fGAM) Family: poisson Link function: log Formula: gam.ev ~ s(stop, bs = "cr", by = edema0) + s(stop, bs = "cr", by = edema05) + s(stop, bs = "cr", by = edema1) + trt + age + sexf + offset(log(gam.dur)) - 1 Parametric coefficients: Estimate Std. Error z value Pr(>|z|) trt 0.002396 0.187104 0.013 0.989782 age 0.033280 0.009170 3.629 0.000284 *** sexf -0.297481 0.240578 -1.237 0.216262 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Approximate significance of smooth terms: edf Ref.df Chi.sq p-value s(stop):edema0 2.001 2.003 242.0 <2e-16 *** s(stop):edema05 2.001 2.001 166.3 <2e-16 *** s(stop):edema1 2.000 2.001 124.4 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 R-sq.(adj) = -0.146 Deviance explained = -78.4% REML score = 843.96 Scale est. = 1 n = 3120

**leave a comment**for the author, please follow the link and comment on their blog:

**Statistical Reflections of a Medical Doctor » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.