Parametric survival modeling

[This article was first published on Devin Incerti’s blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Survival analysis is used to analyze the time until the occurrence of an event (or multiple events). Cox models—which are often referred to as semiparametric because they do not assume any particular baseline survival distribution—are perhaps the most widely used technique; however, Cox models are not without limitations and parametric approaches can be advantageous in many contexts. For instance, parametric survival models are essential for extrapolating survival outcomes beyond the available follow-up data. For this reason they are nearly always used in health-economic evaluations where it is necessary to consider the lifetime health effects (and costs) of medical interventions.

R contains a large number of packages related to biostatistics and its support for parametric survival modeling is no different. Below we will examine a range of parametric survival distributions, their specifications in R, and the hazard shapes they support. We will then show how the flexsurv package can make parametric regression modeling of survival data straightforward.

Survival distributions

The primary quantity of interest in survival analysis is the survivor function, defined as the probability of survival beyond time $t$,

where $T$ is a random variable denoting the time that the event occurs. The survival function is the complement of the cumulative density function (CDF), $F(t) = \int_0^t f(u)du$, where $f(t)$ is the probability density function (PDF).

The hazard function, or the instantaneous rate at which an event occurs at time $t$ given survival until time $t$ is given by,

The survivor function can also be expressed in terms of the cumulative hazard function, $\Lambda(t) = \int_0^t \lambda (u)du$,

R functions for parametric distributions used for survival analysis are shown in the table below. The default stats package contains functions for the PDF, the CDF, and random number generation for many of the distributions. Additional distributions as well as support for hazard functions are provided by flexsurv.

PDF CDF Hazard Random sampling
Exponential stats::dexp stats::pexp flexsurv::hexp flexsurv::rexp
Weibull (AFT) stats::dweibull stats::pweibull flexsurv::hweibull stats::rweibull
Weibull (PH) flexsurv::dweibullPH flexsurv::pweibullPH flexsurv::hweibullPH flexsurv::rweibullPH
Gompertz flexsurv::dgompertz flexsurv::pgompertz flexsurv::hgompertz flexsurv::rgompertz
Gamma stats::dgamma stats::pgamma flexsurv::hgamma stats::rgamma
Lognormal stats::dlnorm stats::plnorm flexsurv::hlnorm stats::rlnorm
Log-logistic flexsurv::dllogis flexsurv::pllogis flexsurv::hllogis flexsurv::rllogis
Generalized gamma flexsurv::dgengamma flexsurv::pgengamma flexsurv::hgengamma flexsurv::rgengamma

The parameterizations of these distributions in R are shown in the next table. The parameter of primary interest (in flexsurv) is colored in red—it is known as the location parameter and typically governs the mean or location for each distribution. The other parameters are ancillary parameters that determine the shape, variance, or higher moments of the distribution. These parameters impact the hazard function, which can take a variety of shapes depending on the distribution:

  • the exponential distribution only supports a constant hazard;
  • the Weibull, Gompertz, and gamma distributions support monotonically increasing and decreasing hazards;
  • the log-logistic and lognormal distributions support arc-shaped and monotonically decreasing hazards; and
  • the generalized gamma distribution supports an arc-shaped, bathtub-shaped, monotonically increasing, and monotonically decreasing hazards.
PDF CDF Hazard Parameters Hazard shape
Exponential $\lambda e^{-\lambda t}$ $1 – e^{-\lambda t}$ $\lambda$ $\color{red}{\text{rate}} = \lambda \gt 0$ Constant
Weibull (AFT) $\frac{a}{b}\left(\frac{t}{b}\right)^{a-1}e^{-(t/b)^a}$ $1 – e^{-(t/b)^a}$ $\frac{a}{b}\left(\frac{t}{b}\right)^{a-1}$ $\text{shape} = a \gt 0 \\ \color{red}{\text{scale}} = b \gt 0$ Constant, monotonically increasing/decreasing
Weibull (PH)a $a m t^{a-1} e^{-m t^a}$ $1 – e^{-mt^a}$ $amt^{a-1}$ $\text{shape} = a \gt 0 \\ \color{red}{\text{scale}} = m \gt 0$ Constant, monotonically increasing/decreasing
Gompertz $b e^{at} \exp\left[-\frac{b}{a}(e^{at}-1)\right]$ $1 – \exp\left[-\frac{b}{a}(e^{at}-1)\right]$ $b e^{at}$ $\text{shape} = a \in (-\infty, \infty) \\ \color{red}{\text{rate}} = b \gt 0$ Constant, monotonically increasing/decreasing
Gammab $\frac{b^a}{\Gamma(a)}t^{a -1}e^{-bt}$ $\frac{\gamma(a, bt)}{\Gamma(a)}$ $f(t)/S(t)$ $\text{shape} = a \gt 0 \\ \color{red}{\text{rate}} = b \gt 0$ Constant, monotonically increasing/decreasing
Lognormal $\frac{1}{t\sigma\sqrt{2\pi}}e^{-\frac{(\ln t – \mu)^2}{2\sigma^2}}$ $\Phi\left(\frac{\ln t – \mu}{\sigma}\right)$ $f(t)/S(t)$ $\color{red}{\text{meanlog}} = \mu \in (-\infty, \infty) \\ \text{sdlog} = \sigma \gt 0$ Arc-shaped, monotonically decreasing
Log-logistic $\frac{(a/b)(t/b)^{a-1}}{\left(1 + (t/b)^a\right)^2}$ $\frac{1}{(1+(t/b)^a)}$ $1-\frac{(a/b)(t/b)^{a-1}}{\left(1 + (t/b)^a\right)}$ $\text{shape} = a \gt 0 \\ \color{red}{\text{scale}} = b \gt 0$ Arc-shaped, monotonically decreasing
Generalized gammac $\frac{|Q|(Q^{-2})^{Q^{-2}}}{\sigma t \Gamma(Q^{-2})} \exp\left[Q^{-2}\left(Qw-e^{Qw}\right)\right]$ $\begin{cases}
\frac{\gamma(Q^{-2}, u)}{\Gamma(Q^{-2})} \text{ if } Q \neq 0 \\
\Phi(w) \text{ if } Q = 0
\end{cases}$
$f(t)/S(t)$ $\color{red}{\text{mu}} = \mu \in (-\infty, \infty) \\ \text{sigma} = \sigma \gt 0 \\ \text{Q} = Q \in (-\infty, \infty)$ Arc-shaped, bathtub-shaped, monotonically increasing/decreasing
a The proportional hazard (PH) model is a reparameterization of the accelerated failure time (AFT) model with $m = b^{-a}$.
b $\Gamma(z) = \int_{0}^{\infty} x^{z-1}e^{-x} dx$ is the gamma function and $\gamma(s,x) = \int_{0}^{x} t^{s-1}e^{-t}dt$ is the lower incomplete gamma function.
c $w = (log(t) – \mu)/\sigma; u = Q^{-2}e^{Qw}$.

Shapes of hazard functions

We will now examine the shapes of the hazards in a bit more detail and show how both the location and shape vary with the parameters of each distribution. Readers interested in a more interactive experience can also view my Shiny app here.

To do so we will load some needed packages: we will use flexsurv to compute the hazards, data.table as a fast alternative to data.frame, and ggplot2 for plotting.

<span class="n">library</span><span class="p">(</span><span class="s2">"flexsurv"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="s2">"data.table"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="s2">"ggplot2"</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot2</span><span class="o">::</span><span class="n">theme_set</span><span class="p">(</span><span class="n">theme_minimal</span><span class="p">())</span><span class="w">
</span>

We can create a general function for computing hazards for any general hazard function given combinations of parameter values at different time points. The key to the function is mapply, a multivariate version of sapply. To illustrate, let’s compute the hazard from a Weibull distribution given 3 values each of the shape and scale parameters at time points 1 and 2. The output is a matrix where each row corresponds to a time point and each column is combination of the shape and scale parameters. For example, the second row and third column is the hazard at time point 2 given a shape parameter of 1.5 and a scale parameter of 1.75.

<span class="n">mapply</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hweibull</span><span class="p">,</span><span class="w"> 
       </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1.5</span><span class="p">),</span><span class="w">
       </span><span class="n">scale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">.25</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1.75</span><span class="p">),</span><span class="w">
       </span><span class="n">MoreArgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">))</span><span class="w">
</span>
##           [,1] [,2]      [,3]
## [1,] 1.0000000    1 0.6479391
## [2,] 0.7071068    1 0.9163243

The more general function uses mapply to return a data.table of hazards at all possible combinations of the parameter values and time points. Factor variables and intuitive names are also returned to facilitate plotting with ggplot2.

<span class="n">hazfun</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">FUN</span><span class="p">,</span><span class="w"> </span><span class="n">param_vals</span><span class="p">,</span><span class="w"> </span><span class="n">times</span><span class="p">){</span><span class="w">
  </span><span class="c1"># Compute hazard for all possible combinations of parameters and times</span><span class="w">
  </span><span class="n">values</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">do.call</span><span class="p">(</span><span class="s2">"mapply"</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">FUN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FUN</span><span class="p">),</span><span class="w">
                                </span><span class="n">as.list</span><span class="p">(</span><span class="n">expand.grid</span><span class="p">(</span><span class="n">param_vals</span><span class="p">)),</span><span class="w">
                                </span><span class="nf">list</span><span class="p">(</span><span class="n">MoreArgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">))))</span><span class="w">
  </span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.table</span><span class="p">(</span><span class="n">expand.grid</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">param_vals</span><span class="p">,</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">time</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">))),</span><span class="w">
                  </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">t</span><span class="p">(</span><span class="n">values</span><span class="p">)))</span><span class="w">
  
  </span><span class="c1"># Create factor variables and intuitive names for plotting</span><span class="w">
  </span><span class="n">param_names</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">param_vals</span><span class="p">)</span><span class="w">
  </span><span class="n">x</span><span class="p">[[</span><span class="n">param_names</span><span class="p">[</span><span class="m">1</span><span class="p">]]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">x</span><span class="p">[[</span><span class="n">param_names</span><span class="p">[</span><span class="m">1</span><span class="p">]]],</span><span class="w">
                                </span><span class="n">levels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="n">x</span><span class="p">[[</span><span class="n">param_names</span><span class="p">[</span><span class="m">1</span><span class="p">]]]))</span><span class="w">
  </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">param_vals</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">1</span><span class="p">){</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">2</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">param_vals</span><span class="p">)){</span><span class="w">
      </span><span class="n">ordered_vals</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="n">x</span><span class="p">[[</span><span class="n">param_names</span><span class="p">[</span><span class="n">j</span><span class="p">]]])</span><span class="w">
      </span><span class="n">x</span><span class="p">[[</span><span class="n">param_names</span><span class="p">[</span><span class="n">j</span><span class="p">]]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">param_names</span><span class="p">[</span><span class="n">j</span><span class="p">],</span><span class="w"> </span><span class="s2">" = "</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">[[</span><span class="n">param_names</span><span class="p">[</span><span class="n">j</span><span class="p">]]])</span><span class="w">
      </span><span class="n">factor_levels</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">param_names</span><span class="p">[</span><span class="n">j</span><span class="p">],</span><span class="w"> </span><span class="s2">" = "</span><span class="p">,</span><span class="w"> </span><span class="n">ordered_vals</span><span class="p">)</span><span class="w">
      </span><span class="n">x</span><span class="p">[[</span><span class="n">param_names</span><span class="p">[</span><span class="n">j</span><span class="p">]]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">x</span><span class="p">[[</span><span class="n">param_names</span><span class="p">[</span><span class="n">j</span><span class="p">]]],</span><span class="w">
                                    </span><span class="n">levels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor_levels</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
  
  </span><span class="c1"># Return</span><span class="w">
  </span><span class="nf">return</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

Exponential distribution

The exponential distribution is parameterized by a single rate parameter and only supports a hazard that is constant over time. The hazard is simply equal to the rate parameter.

<span class="n">times</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">rate</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">haz_exp</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hazfun</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hexp</span><span class="p">,</span><span class="w"> 
                  </span><span class="nf">list</span><span class="p">(</span><span class="n">rate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rate</span><span class="p">),</span><span class="w">
                  </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz_exp</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rate</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Time"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rate</span><span class="p">)</span><span class="w"> 
</span>

plot of chunk haz_exp

Weibull distribution (AFT)

The Weibull distribution can be parameterized as both an accelerated failure time (AFT) model or as a proportional hazards (PH) model. The parameterization in the base stats package is an AFT model.

The hazard is decreasing for shape parameter $a < 1$ and increasing for $a > 1$. For $a = 1$, the Weibull distribution is equivalent to an exponential distribution with rate parameter $1/b$ where $b$ is the scale parameter.

<span class="n">wei_shape</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">.25</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">.25</span><span class="p">)</span><span class="w">
</span><span class="n">haz_wei</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hazfun</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hweibull</span><span class="p">,</span><span class="w"> 
                  </span><span class="nf">list</span><span class="p">(</span><span class="n">scale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
                       </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wei_shape</span><span class="p">),</span><span class="w">
                  </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz_wei</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">scale</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">shape</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free_y"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Time"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> 
</span>

plot of chunk haz_wei

Weibull distribution (PH)

flexsurv provides an alternative PH parameterization of the Weibull model with the same shape parameter $a$ and a scale parameter $m = b^{-a}$ where $b$ is the scale parameter in the AFT model.

The hazard is again decreasing for $a < 1$, constant for $a = 1$, and increasing for $a > 1$. Note that for $a = 1$, the PH Weibull distribution is equivalent to an exponential distribution with rate parameter $m$.

<span class="n">haz_weiPH</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hazfun</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hweibullPH</span><span class="p">,</span><span class="w"> 
                  </span><span class="nf">list</span><span class="p">(</span><span class="n">scale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
                       </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wei_shape</span><span class="p">),</span><span class="w">
                  </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz_weiPH</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">scale</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">shape</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free_y"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Time"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> 
</span>

plot of chunk haz_weiPH

Gompertz distribution

The Gompertz distribution is parameterized by a shape parameter $a$ and rate parameter $b$. The hazard is increasing for $a > 0$, constant for $a = 0$, and decreasing for $a < 0$. When $a = 0$, the Gompertz distribution is equivalent to an exponential distribution with rate parameter $b$.

<span class="n">haz_gomp</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hazfun</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hgompertz</span><span class="p">,</span><span class="w"> 
                  </span><span class="nf">list</span><span class="p">(</span><span class="n">rate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
                       </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">-2</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">)),</span><span class="w">
                  </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz_gomp</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rate</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">shape</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free_y"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Time"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> 
</span>

plot of chunk haz_gomp

Gamma distribution

The gamma distribution is parameterized by a shape parameter $a$ and a rate parameter $b$. Like the Weibull distribution, the hazard is decreasing for $a < 1$, constant for $a = 1$, and increasing for $a >1$. In the case where $a = 1$, the gamma distribution is an exponential distribution with rate parameter $b$.

<span class="n">haz_gam</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hazfun</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hgamma</span><span class="p">,</span><span class="w"> 
                  </span><span class="nf">list</span><span class="p">(</span><span class="n">rate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
                       </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1e-5</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">6</span><span class="p">))),</span><span class="w">
                  </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz_gam</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rate</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">shape</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free_y"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Time"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> 
</span>

plot of chunk haz_gam

Lognormal distribution

The lognormal distribution is parameterized by the mean $\mu$ and standard deviation $\sigma$ of survival time on the log scale. The lognormal hazard is either monotonically decreasing or arc-shaped. Note that the shape of the hazard depends on the values of both $\mu$ and $\sigma$.

<span class="n">haz_lnorm</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hazfun</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hlnorm</span><span class="p">,</span><span class="w"> 
                    </span><span class="nf">list</span><span class="p">(</span><span class="n">meanlog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
                         </span><span class="n">sdlog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">)),</span><span class="w">
                  </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">.1</span><span class="p">))</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz_lnorm</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meanlog</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">sdlog</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free_y"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Time"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> 
</span>

plot of chunk haz_lnorm

Log-logistic distribution

The log-logistic distribution is parameterized by a shape parameter $a$ and a scale parameter $b$. When $a > 1$, the hazard function is arc-shaped whereas when $a \leq 1$, the hazard function is decreasing monotonically.

<span class="n">haz_llogis</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hazfun</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hllogis</span><span class="p">,</span><span class="w"> 
                    </span><span class="nf">list</span><span class="p">(</span><span class="n">scale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
                         </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">)),</span><span class="w">
                  </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz_llogis</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">scale</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">shape</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free_y"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Time"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> 
</span>

plot of chunk haz_llogis

Generalized gamma distribution

The generalized gamma distribution is parameterized by a location parameter $\mu$, a scale parameter $\sigma$, and a shape parameter $Q$. It is the most flexible distribution reviewed in this post and includes the exponential ($Q = \sigma = 1$), Weibull ($Q = 1$), gamma ($Q = \sigma$), and lognormal ($Q = 0$) distributions as special cases.

Each row in the figure corresponds to a unique value of $\sigma$ and each column corresponds to a unique value of $Q$.The generalized gamma distribution is quite flexible as it supports hazard functions that are monotonically increasing, monotonically decreasing, arc-shaped, and bathtub shaped.

<span class="n">haz_gengamma</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">hazfun</span><span class="p">(</span><span class="n">flexsurv</span><span class="o">::</span><span class="n">hgengamma</span><span class="p">,</span><span class="w"> 
                       </span><span class="nf">list</span><span class="p">(</span><span class="n">mu</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
                            </span><span class="n">sigma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
                            </span><span class="n">Q</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">-3</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)),</span><span class="w">
                  </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz_gengamma</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mu</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">facet_wrap</span><span class="p">(</span><span class="n">sigma</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">Q</span><span class="p">,</span><span class="w">
             </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">levels</span><span class="p">(</span><span class="n">haz_gengamma</span><span class="o">$</span><span class="n">Q</span><span class="p">)),</span><span class="w">
             </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Time"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bottom"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">guides</span><span class="p">(</span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">guide_legend</span><span class="p">(</span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">byrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span>

plot of chunk haz_gengamma

Regression

In flexsurv, survival models are fit to the data using maximum likelihood. Each parameter can be modeled as a function of covariates $z$,

where $\alpha_l$ is the $l$th parameter and $g^{-1}()$ is a link function (typically $log()$ if the parameter is strictly positive and the identity function if the parameter is defined on the real line).

We will illustrate by modeling survival in a dataset of patients with advanced lung cancer from the survival package. The dataset uses a status indicator where 2 denotes death and 1 denotes alive at the time of last follow-up; we will convert this to the more traditional coding where 0 is dead and 1 is alive.

<span class="n">dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.table</span><span class="p">(</span><span class="n">survival</span><span class="o">::</span><span class="n">lung</span><span class="p">)</span><span class="w">
</span><span class="n">dat</span><span class="p">[,</span><span class="w"> </span><span class="n">status</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">status</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)]</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">dat</span><span class="p">)</span><span class="w">
</span>
##    inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 1:    3  306      1  74   1       1       90       100     1175      NA
## 2:    3  455      1  68   1       0       90        90     1225      15
## 3:    3 1010      0  56   1       0       90        90       NA      15
## 4:    5  210      1  57   1       1       90        60     1150      11
## 5:    1  883      1  60   1       0      100        90       NA       0
## 6:   12 1022      0  74   1       1       50        80      513       0

Intercept only model

We will begin by estimating intercept only parametric regression models (i.e., without covariates). But first, it’s helpful to estimate the hazard function (among all patients) using nonparametric techniques. We can do this using the kernel density estimator from the muhaz package.

<span class="n">library</span><span class="p">(</span><span class="s2">"muhaz"</span><span class="p">)</span><span class="w">
</span><span class="n">kernel_haz_est</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">muhaz</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">dat</span><span class="o">$</span><span class="n">status</span><span class="p">)</span><span class="w">
</span><span class="n">kernel_haz</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.table</span><span class="p">(</span><span class="n">time</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">kernel_haz_est</span><span class="o">$</span><span class="n">est.grid</span><span class="p">,</span><span class="w">
                         </span><span class="n">est</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">kernel_haz_est</span><span class="o">$</span><span class="n">haz.est</span><span class="p">,</span><span class="w">
                         </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Kernel density"</span><span class="p">)</span><span class="w">
</span>

Then we can use flexsurv to estimate intercept only models for a range of probability distributions. The hazard function for each fitted model is returned using summary.flexsurvreg().

<span class="n">dists</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"exp"</span><span class="p">,</span><span class="w"> </span><span class="s2">"weibull"</span><span class="p">,</span><span class="w"> </span><span class="s2">"gompertz"</span><span class="p">,</span><span class="w"> </span><span class="s2">"gamma"</span><span class="p">,</span><span class="w"> 
           </span><span class="s2">"lognormal"</span><span class="p">,</span><span class="w"> </span><span class="s2">"llogis"</span><span class="p">,</span><span class="w"> </span><span class="s2">"gengamma"</span><span class="p">)</span><span class="w">
</span><span class="n">dists_long</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Exponential"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Weibull (AFT)"</span><span class="p">,</span><span class="w">
                </span><span class="s2">"Gompertz"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Gamma"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Lognormal"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Log-logistic"</span><span class="p">,</span><span class="w">
                </span><span class="s2">"Generalized gamma"</span><span class="p">)</span><span class="w">
</span><span class="n">parametric_haz</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">vector</span><span class="p">(</span><span class="n">mode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">dists</span><span class="p">))</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">dists</span><span class="p">)){</span><span class="w">
  </span><span class="n">fit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">flexsurvreg</span><span class="p">(</span><span class="n">Surv</span><span class="p">(</span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">status</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dat</span><span class="p">,</span><span class="w"> </span><span class="n">dist</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dists</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> 
  </span><span class="n">parametric_haz</span><span class="p">[[</span><span class="n">i</span><span class="p">]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">summary</span><span class="p">(</span><span class="n">fit</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"hazard"</span><span class="p">,</span><span class="w"> 
                                     </span><span class="n">ci</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">tidy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
  </span><span class="n">parametric_haz</span><span class="p">[[</span><span class="n">i</span><span class="p">]]</span><span class="o">$</span><span class="n">method</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dists_long</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">parametric_haz</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbindlist</span><span class="p">(</span><span class="n">parametric_haz</span><span class="p">)</span><span class="w">
</span>

We can plot the hazard functions from the parametric models and compare them to the kernel density estimate.

<span class="n">haz</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">kernel_haz</span><span class="p">,</span><span class="w"> </span><span class="n">parametric_haz</span><span class="p">)</span><span class="w">
</span><span class="n">haz</span><span class="p">[,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">method</span><span class="p">,</span><span class="w">
                       </span><span class="n">levels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Kernel density"</span><span class="p">,</span><span class="w">
                                  </span><span class="n">dists_long</span><span class="p">))]</span><span class="w">
</span><span class="n">n_dists</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">dists</span><span class="p">)</span><span class="w"> 
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">haz</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">est</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">method</span><span class="p">,</span><span class="w"> </span><span class="n">linetype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">method</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">xlab</span><span class="p">(</span><span class="s2">"Days"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Hazard"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">scale_colour_manual</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> 
                      </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"black"</span><span class="p">,</span><span class="w"> </span><span class="n">rainbow</span><span class="p">(</span><span class="n">n_dists</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_linetype_manual</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
                        </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">rep_len</span><span class="p">(</span><span class="m">2</span><span class="o">:</span><span class="m">6</span><span class="p">,</span><span class="w"> </span><span class="n">n_dists</span><span class="p">)))</span><span class="w">
</span>

plot of chunk reg_plot

The kernel density estimate is monotonically increasing and the slope increases considerably after around 500 days. The arc-shaped lognormal and log-logistic hazards and the constant exponential hazard do not fit the data well. The best performing models are those that support monotonically increasing hazards (Gompertz, Weibull, gamma, and generalized gamma). The flexible generalized gamma and the Gompertz models perform the best with the Gompertz modeling the increase in the slope of the hazard the most closely.

Adding covariates

As mentioned above each parameter can be modeled as a function of covariates. By default, flexsurv only uses covariates to model the location parameter. To demonstrate, we will let the rate parameter of the Gompertz distribution depend on the ECOG performance score (0 = good, 5 = dead), which describes a patient’s level of functioning and has been shown to be a prognostic factor for survival. The model is fit using flexsurvreg().

<span class="n">dat</span><span class="p">[,</span><span class="w"> </span><span class="n">ph.ecog</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">ph.ecog</span><span class="p">)]</span><span class="w">

</span><span class="c1"># Covariates on rate parameter only</span><span class="w">
</span><span class="n">fit_covs1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">flexsurvreg</span><span class="p">(</span><span class="n">Surv</span><span class="p">(</span><span class="n">time</span><span class="p">,</span><span class="w...

To leave a comment for the author, please follow the link and comment on their blog: Devin Incerti’s blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)