Forestplot ❤️ dplyr
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Since initial publishing my forestplot package, dplyr and tidyverse have become evermore dominant in how we think about data and data management. I have therefore just published a 2.0 version that uses many of the awesome select & group features that make tidyverse such a compelling tool.
Basic change
Below is how you can work with some standard meta-analysis data. I assume that you have a base dataset with all your studies and then one additional with the summary data. We can easily append rows for header or empty rows without all the manual fuzz the previous solution required:
# Cochrane data base_data <- tibble(mean = c(0.578, 0.165, 0.246, 0.700, 0.348, 0.139, 1.017), lower = c(0.372, 0.018, 0.072, 0.333, 0.083, 0.016, 0.365), upper = c(0.898, 1.517, 0.833, 1.474, 1.455, 1.209, 2.831), study = c("Auckland", "Block", "Doran", "Gamsu", "Morrison", "Papageorgiou", "Tauesch"), deaths_steroid = c("36", "1", "4", "14", "3", "1", "8"), deaths_placebo = c("60", "5", "11", "20", "7", "7", "10"), OR = c("0.58", "0.16", "0.25", "0.70", "0.35", "0.14", "1.02")) summary <- tibble(mean = 0.531, lower = 0.386, upper = 0.731, study = "Summary", OR = "0.53", summary = TRUE) header <- tibble(study = c("", "Study"), deaths_steroid = c("Deaths", "(steroid)"), deaths_placebo = c("Deaths", "(placebo)"), OR = c("", "OR"), summary = TRUE) empty_row <- tibble(mean = NA_real_) cochrane_output_df <- bind_rows(header, base_data, empty_row, summary) cochrane_output_df %>% forestplot(labeltext = c(study, deaths_steroid, deaths_placebo, OR), is.summary = summary, clip = c(0.1, 2.5), hrzl_lines = list("3" = gpar(lty = 2), "11" = gpar(lwd = 1, columns = 1:4, col = "#000044")), xlog = TRUE, col = fpColors(box = "royalblue", line = "darkblue", summary = "royalblue", hrz_lines = "#444444"))
Just as a comparison you can see the old code for achieving the same thing:
cochrane_from_rmeta <- structure(list(mean = c(NA, NA, 0.578, 0.165, 0.246, 0.700, 0.348, 0.139, 1.017, NA, 0.531), lower = c(NA, NA, 0.372, 0.018, 0.072, 0.333, 0.083, 0.016, 0.365, NA, 0.386), upper = c(NA, NA, 0.898, 1.517, 0.833, 1.474, 1.455, 1.209, 2.831, NA, 0.731)), .Names = c("mean", "lower", "upper"), row.names = c(NA, -11L), class = "data.frame") tabletext <- cbind(c("", "Study", "Auckland", "Block", "Doran", "Gamsu", "Morrison", "Papageorgiou", "Tauesch", NA, "Summary"), c("Deaths", "(steroid)", "36", "1", "4", "14", "3", "1", "8", NA, NA), c("Deaths", "(placebo)", "60", "5", "11", "20", "7", "7", "10", NA, NA), c("", "OR", "0.58", "0.16", "0.25", "0.70", "0.35", "0.14", "1.02", NA, "0.53")) forestplot(tabletext, hrzl_lines = list("3" = gpar(lty = 2), "11" = gpar(lwd = 1, columns = 1:4, col = "#000044")), cochrane_from_rmeta, is.summary = c(TRUE,TRUE,rep(FALSE,8),TRUE), clip = c(0.1,2.5), xlog = TRUE, col = fpColors(box = "royalblue", line = "darkblue", summary = "royalblue", hrz_lines = "#444444"))
The difference perhaps doesn’t seem huge here but the ability to work directly with data.frame
objects instead of matrices or lists makes life much easier.
Grouped coefficients
One of the features that was rather unique when I started working on this package was the ability to have multiple bands per row. This allows us to package data more densely and thus makes it easier to compare results.
data(dfHRQoL) dfHRQoL %>% group_by(group) %>% forestplot(legend_args = fpLegend(pos = list(x = .25, y = 0.95), gp = gpar(col = "#CCCCCC", fill = "#F9F9F9")), fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI), boxsize = .25, # We set the box size to better visualize the type line.margin = .2, # We need to add this to avoid crowding clip = c(-.125, 0.075), col = fpColors(box = c("blue", "darkred")), xlab = "EQ-5D index")
Do I have to change everything?
The major version number change is due to the API change, although most of you should not have to change anything with this update. The core things to look for are:
- In loops/functions, you have to manually call print (or plot) on the returned object. This as the 2.0 returns an object that is only plotted once print is called on it – the same way ggplot works. This should be automatic unless you’ve encapsulated the forestplot function.
- If you have provided a data.frame as your first input argument then there may be issues as this will result that the
tidyselect
syntax is used.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.