We just published a new Survival Analysis tutorial. You can find code, an explanation of methods, and six interactive ggplot2 and Python graphs here.
How We Built It
Survival analysis is a set of statistical methods for analyzing events over time: time to death in biological systems, failure time in mechanical systems, etc. We used the tongue dataset from the KMsurv package in R, pandas and the lifelines library in Python, the survival package for R, the IPython Notebook to execute and publish code, and rpy2 to execute R code in the same document as the Python code.
Plotly is a platform for making and sharing interactive, D3.js graphs with APIs for R, Python, MATLAB, and Excel. You can make graphs and analyze data on Plotly’s free public cloud and within Shiny Apps. For collaboration and sensitive data, you can run Plotly Enterprise on your own servers.
The Plots We Made
For our first plot, made with R, the y axis represents the probability a patient is still alive at time t weeks. We see a steep drop off within the first 100 weeks, and then observe the curve flattening. The dotted lines represent the 95% confidence intervals. See the code, details, and plot in the IPython Notebook.
And now with Python. Click and drag to zoom, or hover your mouse to see data.
Many times there are different groups contained in a single dataset. These may represent categories such as treatment groups, different species, or different manufacturing techniques. The type variable in the tongues dataset describes a patients DNA profile. Below we define a Kaplan-Meier estimate for each of these groups in R and Python. Here we make the plot with R:
It looks like DNA Type 2 is potentially more deadly, or more difficult to treat compared to Type 1. But check out the IPython Notebook for more details. And now with Python: