Generalized Pairs Plot: It’s about time!

March 28, 2013

(This article was first published on BioStatMatt » R, and kindly contributed to R-bloggers)

JW Emerson, WA Green, B Schloerke, J Crowley, D Cook, H Hofmann, H Wickham (2013) The Generalized Pairs Plot. Journal of Computational and Graphical Statistics 22(1). Here’s a free preprint version.

Until this new paper and implementation by Emerson et al., there were no widely available pairs plots that accommodated both numerical and categorical fields. ***Update 3/29/2013: ggpairs in the GGally package has been around since 2010***. A browse through the R Graph Gallery confirms this (as of 1/30/2013). See here too: a post on the Quick-R blog. I had been working on such a plot when I discovered the above article. Hence, I’m using this post to share my work, which I will probably abandon in favor of the above.

Any number of statistical graphics might be used instead of a scatterplot for numeric/numeric pairs; maybe a hexbin plot. A sieve plot or an association plot might be used as an alternative to the mosaicplot for factor/factor pairs. A beeswarm boxplot plot might be used in place of side-by-side boxplots for numeric/factor pairs.

Here was my provisional version of the generalized pairs plot, which I had called an ‘association matrix plot’:

pairsdf <- function(df, abbr = TRUE, abbr.len = 4) {
    par(mfrow = rep(length(df), 2))
    for (row in 1:length(df)) {
        xr <- df[[row]]
        if (is.character(xr) || is.logical(xr)) 
            xr <- as.factor(xr)
        if (is.factor(xr) && abbr) 
            levels(xr) <- abbreviate(levels(xr), 4)
        for (col in 1:length(df)) {
            xc <- df[[col]]
            if (is.character(xc) || is.logical(xc)) 
                xc <- as.factor(xc)
            if (inherits(xc, "factor") && abbr) 
                levels(xc) <- abbreviate(levels(xc), 4)
            cnm <- names(df)[col]
            rnm <- names(df)[row]
            if (col == row) {
                plot(c(0, 1), c(0, 1), type = "n", xaxt = "n", 
                  yaxt = "n", bty = "n", xlab = "", ylab = "", 
                  main = "")
                text(x = 0.5, y = 0.5, labels = cnm, adj = c(0.5, 
                  0.5), cex = 2)
            else {
                iscf <- is.factor(xc)
                iscn <- is.numeric(xc)
                isrf <- is.factor(xr)
                isrn <- is.numeric(xr)
                if (isrf && iscf) {
                  mosaicplot(table(xc, xr), xlab = cnm, ylab = rnm, 
                    main = "", las = 2, color = TRUE, cex = 1.1)
                else if (isrn && iscn) {
                  plot(xc, xr, xlab = cnm, ylab = rnm, main = "", 
                    las = 2, cex = 1.1)
                else if (isrn && iscf) {
                  boxplot(xr ~ xc, xlab = cnm, ylab = rnm, main = "", 
                    las = 2, cex = 1.1)
                else if (isrf && iscn) {
                  boxplot(xc ~ factor(xr, levels = rev(levels(xr))), 
                    xlab = cnm, ylab = rnm, main = "", las = 2, 
                    cex = 1.1, horizontal = TRUE)
                else stop("urecognized variable type")

Below are several association matrix plots generated by the above function (i.e., pairsdf) for data sets found in the MASS package. When there are many fields, I recommend using three to four square inches per plot.

It’s easy to see that the coop data set describes a simple factorial experiment.

However, the Rabbit data clearly arose from a more complicated experiment.

The fields of the farms data set are all of the factor class.

To leave a comment for the author, please follow the link and comment on their blog: BioStatMatt » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)