**Christopher Gandrud (간드루드 크리스토파)**, and kindly contributed to R-bloggers)

Effectively showing estimates and uncertainty from Cox Proportional Hazard (PH) models, especially for interactive and non-linear effects, can be challenging with currently available software. So, researchers often just simply display a results table. These are pretty useless for Cox PH models. It is difficult to decipher a simple linear variable’s estimated effect and basically impossible to understand time interactions, interactions between variables, and nonlinear effects without the reader further calculating quantities of interest for a variety of fitted values.

So, I’ve been putting together the simPH R package to hopefully make it easier to show results from Cox PH models. The package uses plots of post-estimation simulations (the same idea behind the plotting facilities in the Zelig package) to show Cox PH estimates and the uncertainty surrounding them.

Here I want to give an overview of how to use `simPH`

. First the general process, then a few examples. Have a look at this paper for details about the math behind the graphs.

### General Steps

There are three steps to use `simPH`

:

Estimate a Cox PH model in the usual way with the

`coxph`

command in the survival package.Simulate quantities of interest--hazard ratios, first differences, marginal effect, relative hazards, or hazard rates--with the appropriate

`simPH`

simulation command.Plot the simulations with the

`simGG`

method.

### A Few Examples

Here are some basic examples that illustrate the process and key syntax. Before getting started you can install `simPH`

in the normal R way from CRAN.

#### Linear effects

Let’s look at a simple example with a linear non-interactive effect. The data set I’m using is from Carpenter (2002). It is included with `simPH`

. See the package documentation for details.

First, let’s estimate a Cox PH model where the event of interest is a drug receiving FDA approval.

`# Load packages`

library(survival)

library(simPH)

# Load Carpenter (2002) data

data("CarpenterFdaData")

# Estimate survival model

M1 <- coxph(Surv(acttime, censor) ~ prevgenx + lethal +

deathrt1 + acutediz + hosp01 + hhosleng +

mandiz01 + femdiz01 + peddiz01 + orphdum +

vandavg3 + wpnoavg3 + condavg3 + orderent +

stafcder, data = CarpenterFdaData)

Now say we want to examine the effect that the number of FDA staff reviewing a drug application has on it being accepted. This variable is called `stafcder`

in the model we just estimated. To do this let’s use `simPH`

to simulate hazard ratios. We will simulate hazard ratios for units \(j\) and \(l\), i.e. \(\frac{h_{j}(t)}{h_{l}(t)}\) using `simPH`

’s `coxsimLinear`

command, because we estimate the effect of the number of staff as linear. In the following code we use the `Xj`

argument to set the \(j\) values. We could use `Xl`

also, but as we don’t `coxsimLinear`

assumes all \(x_{l}\) are 0.

Notice that the argument `spin = TRUE`

. This finds the shortest 95% probability interval of the simulations using the SPIn method. SPIn can be especially useful for showing simulated quantities of interest generated from Cox PH models, because then can often be crowded close to a lower boundary (0 in the case of hazard rates). We should be more interested in the area with the highest probability–most simulated values–rather than an arbitrary central interval. That being said, if `spin = FALSE`

, then we will simply find the central 95% interval of the simulations.

`# Simulate and plot Hazard Ratios for stafcder variable`

Sim1 <- coxsimLinear(M1, b = "stafcder",

qi = "Hazard Ratio", ci = 0.95,

Xj = seq(1237, 1600, by = 2), spin = TRUE)

# Plot

simGG(Sim1)

Notice in the plot that each simulation is plotted as an individual point. These are all of the simulations in the shortest 95% probability interval. Each point has a bit of transparency (they are 90% transparent by default). So the plot is visually weighted; the darker areas of the graph have a higher concentration of the simulations. This gives us a very clear picture of the simulation distribution, i.e. the estimated effect and our uncertainty about it.

If you don’t want to plot every point, you can simply use ribbons showing the constricted 95% and the middle 50% of this interval. To do this simply use the `ribbons = TRUE`

argument.

`simGG(Sim1, ribbons = TRUE, alpha = 0.5)`

Notice the `alpha = 0.5`

argument. This increased the transparency of the widest ribbon to 50%.

The syntax we’ve used here is very similar to what we use when we are working with nonlinear effects estimated with polynomials and splines. Post-estimation simulations can be run with the `coxsimPoly`

and `coxsimSpline`

commands. See the `simPH`

documentation for more examples.

#### Interactions

The syntax for two-way interactions simulated with the `coxsimInteract`

command is a little different. Using the same data, let’s look at how to show results for interactions. In the following model we are interacting two variables `lethal`

and `prevgenx`

. We can think of these variables as `X1`

and `X2`

, respectively. For interactions it can be useful to examine the marginal effect. To find the marginal effect of a one unit increase in the `lethal`

variable given various values of `prevgenx`

let’s use the following code:

`# Estimate the model`

M2 <- coxph(Surv(acttime, censor) ~ lethal*prevgenx, data = CarpenterFdaData)

# Simulate Marginal Effect of lethal for multiple values of prevgenx

Sim2 <- coxsimInteract(M2, b1 = "lethal", b2 = "prevgenx",

qi = "Marginal Effect",

X2 = seq(2, 115, by = 2), nsim = 1000)

# Plot the results

simGG(Sim2, ribbons = TRUE, alpha = 0.5, xlab = "\nprevgenx",

ylab = "Marginal effect of lethal\n")

The order of the `X1`

and `X2`

variables in the interactions matters. The marginal effect is always calculated for the `X1`

variable over a range of `X2`

values.

Notice also that we set the `x`

and `y`

axis labels with the `xlab`

and `ylab`

arguments.

#### Time-varying Effects

Finally, let’s look at how to use with the `coxsimtvc`

command to show results from effects that we estimate to vary over time. Here we are going to use another data set that is also included with `simPH`

. The event of interest in the following model is when deliberation is stopped on a European Union directive (the model is from Licht (2011)). We will create hazard ratios for the effect that the number of backlogged items (`backlog`

) has on deliberation time. We will estimate the effect as a log-time interaction.

`# Load Golub & Steunenberg (2007) data. The data is included with simPH.`

data("GolubEUPData")

# Create natural log-time interactions

Golubtvc <- function(x){

tvc(data = GolubEUPData, b = x, tvar = "end", tfun = "log")

}

GolubEUPData$Lcoop <- Golubtvc("coop")

GolubEUPData$Lqmv <- Golubtvc("qmv")

GolubEUPData$Lbacklog <- Golubtvc("backlog")

GolubEUPData$Lcodec <- Golubtvc("codec")

GolubEUPData$Lqmvpostsea <- Golubtvc("qmvpostsea")

GolubEUPData$Lthatcher <- Golubtvc("thatcher")

# Estimate model

M1 <- coxph(Surv(begin, end, event) ~ qmv + qmvpostsea + qmvpostteu +

coop + codec + eu9 + eu10 + eu12 + eu15 + thatcher +

agenda + backlog + Lqmv + Lqmvpostsea + Lcoop + Lcodec +

Lthatcher + Lbacklog,

data = GolubEUPData, ties = "efron")

Much of the first half of the code is dedicated to creating the log-time interactions with the `tvc`

command.

Now we simply create the simulations for a range of values of `backlog`

and plot them. Note in the following code that we tell `coxsimtvc`

the name of both the base `backlog`

variable and its log-time interaction term `Lbacklog`

using the `btvc`

argument. We also need to tell `coxsimtvc`

that it is a log-time interaction with `tfun = "log"`

.

`# Create simtvc object for relative hazard`

Sim2 <- coxsimtvc(obj = M1, b = "backlog", btvc = "Lbacklog",

qi = "Relative Hazard", Xj = seq(40, 200, 40),

tfun = "log", from = 1200, to = 2000, by = 10,

nsim = 500)

# Create relative hazard plot

simGG(Sim2, xlab = "\nTime in Days", ribbons = TRUE,

leg.name = "Backlogged \n Items", alpha = 0.2)

#### simGG Plots are ggplot2

Finally, because almost every plot created by `simGG`

is a ggplot2 plot, you can use almost the full range of customisation options that that package allows. See the ggplot2 documentation for many examples.

**leave a comment**for the author, please follow the link and comment on his blog:

**Christopher Gandrud (간드루드 크리스토파)**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...