**4D Pie Charts » R**, and kindly contributed to R-bloggers)

Here at the Health and Safety Laboratory* we’re big fans of physiologically-based pharmacokinetic (PBPK) models (say that 10 times fast) for predicting concentrations of chemicals around your body based upon an exposure. These models take the form of a big system of ODEs.

Because they contain many equations and consequently many parameters (masses of organs and blood flow rates and partition coefficients), it is important to do a sensitivity analysis on the model to determine which parameters are important and which are not. After all, it is tricky to accurately measure many of these parameters, so you need to know whether or not an error in a parameter will throw off the whole model.

How to do sensitivity analysis of an ODE-based model is a worthy topic of discussion, but not one I want to cover today. For now, it is sufficient to know the basic idea: you input different parameter combinations and see which parameters cause the model output to change the most.

For each parameter, the sensitivity analysis returns a *main effect*, which is the fraction of variance in model output accounted for by that parameter. Each parameter also has an *interaction effect*, which is the fraction of variance accounted for by that parameter’s interaction with all the other parameters. The main effect + interactions give the *total effect* for that parameter. Got it?

When I was working on this problem recently, I searched the literature for ways to display the results of the sensitivity analysis. “Draw a pie chart” it said. Not with less than 4 dimensions, thank you! “How about a bar chart?”, the literature suggested. How boring. It was clear that something better needed to be done.

**Note that the plotting technique I’m about to describe is suitable for any sensitivity analysis, not just those of PBPK models. **

The first step from the basic bar chart was to switch to a series of 3 Pareto plots for main effects, interactions and total effects. This was a big improvement, but having three graphs to look at was a pain.

I also tried a stacked bar chart for the total effects, but this still didn’t answer the most important question:

How many parameters should I include in my model?

This is a rather fuzzy question, but it is possible to restate it in a way that becomes solvable

What is the least number of parameters that I need to include to account for a given percentage of variance in my model?

It turns out that you don’t know an exact value for the amount of variance accounted for – you just know a range. The lower bound is the sum of the main effects of the parameters you include in the model. The upper bound is the sum of that total effects that you included. Actually we can do better than that. The effect of interactions are double counted (since each interaction is the interaction with all other parameters). So our second upper bound is that the included vairance can never exceed 100% minus the sum of all the main effects that we didn’t include.

I realise that there have been far too many long sentences in this post already, so I’ll skip to my solution. After some pondering, I realised that I could combine the Pareto plot with the stacked bar chart, which I call the *Lowry Plot*, named for the Lancashire artist – the resultant graph looks a lot like a smoke stack, of which he painted many.

If you include the first 4 parameters then, reading up from the fourth bar, you account for between 85% and 90% of the variance. Conversely, to include at least 90% of the variance, you find the bar where the whole ribbon is above 90%. In this case you need the first six parameters.

To recreate this, here’s some sample data taken from a model of m-xylene. The important thing is that you need 3 columns: parameter name, main effect and interaction.

library(ggplot2) m_xylene_data <- data.frame( Parameter = c( "BW", "CRE", "DS", "KM", "MPY", "Pba", "Pfaa", "Plia", "Prpda", "Pspda", "QCC", "QfaC", "QliC", "QPC", "QspdC", "Rurine", "Vfac", "VliC", "Vmax"), "Main Effect" = c( 1.03E-01, 9.91E-02, 9.18E-07, 3.42E-02, 9.27E-3, 2.82E-2, 2.58E-05, 1.37E-05, 5.73E-4, 2.76E-3, 6.77E-3, 8.67E-05, 1.30E-02, 1.19E-01, 4.75E-04, 5.25E-01, 2.07E-04, 1.73E-03, 1.08E-03), Interaction = c( 1.49E-02, 1.43E-02, 1.25E-04, 6.84E-03, 3.25E-03, 7.67E-03, 8.34E-05, 1.17E-04, 2.04E-04, 7.64E-04, 2.84E-03, 8.72E-05, 2.37E-03, 2.61E-02, 6.68E-04, 4.57E-02, 1.32E-04, 6.96E-04, 6.55E-04 ) )

Sometimes it is easier to use the data in wide format, other times in long format. This data fortification process returns both, plus some extra columns, and a resampled version.

fortify_lowry_data <- function(data, param_var = "Parameter", main_var = "Main.Effect", inter_var = "Interaction", n = 1000) { #Convert wide to long format #browser() mdata <- melt(data, id.vars = param_var) #Order columns by main effect and reorder parameter levels o <- order(data[, main_var], decreasing = TRUE) data <- data[o, ] data[, param_var] <- factor(data[, param_var], levels = data[, param_var]) #Force main effect, interaction to be numeric data[, main_var] <- as.numeric(data[, main_var]) data[, inter_var] <- as.numeric(data[, inter_var]) #total effect is main effect + interaction data$.total.effect <- rowSums(data[, c(main_var, inter_var)]) #Get cumulative totals for the ribbon data$.cumulative.main.effect <- cumsum(data[, main_var]) data$.cumulative.total.effect <- cumsum(data$.total.effect) #A quirk of ggplot2 means we need x coords of bars data$.numeric.param <- as.numeric(data[, param_var]) #The other upper bound #.maximum = 1 - main effects not included data$.maximum <- c(1 - rev(cumsum(rev(data[, main_var])))[-1], 1) #Resample data #This is needed otherwise some ribbon points don't quite match up resampled <- approx(data[, param_var], data$.cumulative.main.effect, n = n) data_interpolated <- data.frame( x = resampled$x, ymin = resampled$y) data_interpolated$ymax <- approx(data[, param_var], data$.cumulative.total.effect, n = n)$y data_interpolated$maximum <- approx(data[, param_var], data$.maximum, n = n)$y data_interpolated$valid.ymax <- with(data_interpolated, pmin(maximum, ymax)) mdata[, param_var] <- factor(mdata[, param_var], levels = data[, param_var]) list(data = data, mdata = mdata, data_interpolated = data_interpolated) }

The plot is essentially a bar chart plus a cumulative frequency ribbon, with from tweaks to make it prettier.

lowry_plot <- function(data, param_var = "Parameter", main_var = "Main.Effect", inter_var = "Interaction", x_lab = "Parameters", y_lab = "Total Effects (= Main Effects + Interactions)", ribbon_alpha = 0.5) { #Fortify data and dump contents into plot function environment data_list <- fortify_lowry_data(data, param_var, main_var, inter_var) list2env(data_list, envir = sys.frame(sys.nframe())) p <- ggplot(data) + geom_bar(aes_string(x = param_var, y = "value", fill = "variable"), data = mdata) + geom_ribbon(aes(x = x, ymin = ymin, ymax = valid.ymax), data = data_interpolated, alpha = ribbon_alpha) + xlab(x_lab) + ylab(y_lab) + scale_y_continuous(formatter = "percent") + opts(axis.text.x = theme_text(angle = 25, hjust = 1)) + scale_fill_grey(end = 0.5) + opts(legend.position = "top", legend.title = theme_blank(), legend.direction = "horizontal" ) p } m_xylene_lowry <- lowry_plot(m_xylene_data)

*Dear legal folk, this is a personal blog and I’m not officially speaking on behalf of my employer. That said, a substantial chunk of the people who care about these things love PBPK.

Tagged: data-viz, r

**leave a comment**for the author, please follow the link and comment on his blog:

**4D Pie Charts » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...