The Pirate Plot (2.0) – The RDI plotting choice of R pirates

ndphillips

6 years ago

[This article was first published on R – Nathaniel D. Phillips, PhD, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Plain vanilla barplots are as uninformative (and ugly) as they are popular. And boy, are they popular. From the floors of congress, to our latest scientific articles, barplots surround us. The reason why barplots are so popular is because they are so simple and easy to understand. However, that simplicity also carries costs — namely, barplots can mask important patters in data like multiple modes and skewness.

Instead of barplots, we should be using RDI plots, where RDI stands for Raw (data), Description and Inference. Specifically, an RDI plot should present complete raw data — including smoothed densities, descriptive statistics — like means and medians, and Inferential statistics — like a Bayesian 95% Highest Density Interval (HDI). The R community already has access to many great examples of plots that come close to the RDI trifecta. For example, beanplots, created by the beanplot() function, show complete raw data and smoothed distributions (Kampstra, 2008).

Today, the R community has access to a new RDI plot — the pirate plot. I discovered the original code underlying the pirate plot during a late night swim on the Bodensee in Konstanz Germany. The pirate plot function was written in an archaic German pirate dialect on an old beer bottle and is unfortunately unusable. However, I have taken the time to painstakingly translate the original pirate code into a new R function called pirateplot(). The latest version (now 2.0) of the translations are stored in the yarrr package on Github at (www.github.com/ndphillips/yarrr). To install the package and access the piratepal() function within R, run the following code:


install.packages("devtools")
library("devtools")
install_github("ndphillips/yarrr")

Once you’ve installed the yarrr package, you need to load the yarrr package with the library command


library("yarrr")

Now you’re ready to make some pirate plots! Let’s create a pirate plot from the pirates dataset in the yarrr package. This dataset contains results from a survey of several pirates at the Bodensee in Konstanz. We’ll create a pirateplot showing the distribution of ages of pirates based on their favorite pirate:


pirateplot(formula = age ~ favorite.pirate,
           data = pirates,
           xlab = "Favorite Pirate",
           ylab = "Age",
           main = "My First Pirate Plot!")

The arguments for the pirateplot are very similar to that of other plotting functions like barplot() and beanplot(). They key arguments are formula, where you specify one (or two) categorical variable(s) for the x-axis, and and numerical variable for the y-axis.

In addition to the data arguments, there are arguments that dictate the opacity of the 5 key elements of a pirate plot: bar.o, The opacity of the bars. bean.o, the opacity of the beans, point.o, the opacity of the points, and line.o, the opacity of the average lines at the top of the bars. Finally, hdi.o controls the opacity of the 95% Bayesian Highest Density Interval (HDI). The HDIs are calculated using the BEST package (Kruschke, 2013). Because calculating HDIs can be time-consuming, they are turned off by default (i.e.; hdi.o = 0). In the next plots, I’ll turn them on so you can see them.

The pirateplot() function has built-in color arguments. You can control the overall color palette of the plot with pal, and the color of the plot background with back.col. Let’s change a few of these arguments. I’ll also include the 95% Highest Density Intervals (HDIs) by setting hdi.o = .7.


pirateplot(formula = age ~ favorite.pirate,
           data = pirates,
           xlab = "Favorite Pirate",
           ylab = "Age",
           main = "Black and White Pirate Plot",
           pal = "black",
           hdi.o = .7,
           line.o = 1,
           bar.o = .1,
           bean.o = .1,
           point.o = .1,
           point.pch = 16,
           back.col = gray(.97))

As you can see, the entire plot is now grayscale, and different elements of the plot have been emphasised by changing the opacity arguments. For example, now that we’ve set the opacity of the HDI to .8 (the default is 0), we can see the Bayesian 95% Highest Density Interval for the mean of each group.

Hopefully it’s clear how much better RDI plots are than standard bar plots. Now, in addition to just seeing one piece of information (the mean) of each group, we can see all the raw data, a smoothed density curve of the data (helpful for detecting multiple modes and skewness), as well as Bayesian inference.

Oh, and just for comparison purposes, we can create a standard barplot within the pirateplot() function by adjusting the opacity arguments:



pirateplot(formula = age ~ favorite.pirate,
           data = pirates,
           xlab = "Favorite Pirate",
           ylab = "Age",
           main = "Black and White Pirate Plot",
           pal = "black",
           hdi.o = 0,
           line.o = 0,
           bar.o = 1,
           bean.o = 0,
           point.o = 0)

Now how awful does that barplot look in comparison to the far superior pirate plot?!

You can also include multiple independent variables as arguments to the pirateplot() function. For example, I can plot the pirates’ beard lengths separated by sex and the college pirate went to. For this plot, I’ll use the southpark palette and emphasize the HDI by turning its opacity up to .6


pirateplot(formula = beard.length ~ sex + college,
           data = pirates,
           main = "Beard lengths",
           pal = "southpark",
           xlab = "",
           ylab = "Beard Length",
           point.pch = 16,
           point.o = .2,
           hdi.o = .6,
           bar.o = .1,
           line.o = .5)

As you can see, it’s very easy to customise the look and focus of your pirate plot. Here are 6 different plots of the weights of chickens given one of 4 diets (from the ChickWeight dataframe in R). You can see the code for each by accessing the help menu for the pirateplot() function within R.

Have fun creating your own pirate plots! If you have suggestions for further improvements, don’t hesitate to write my squire at yarrr.book@gmail.com.

References

Kampstra, P. (2008) Beanplot: A Boxplot Alternative for Visual Comparison of Distributions. Journal of Statistical Software, Code Snippets, 28(1), 1-9. URL http://www.jstatsoft.org/v28/c01/

Kruschke, J. K. 2013. Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General 142(2):573-603. doi: 10.1037/a0029146

To leave a comment for the author, please follow the link and comment on their blog: R – Nathaniel D. Phillips, PhD.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.