Customising vegan’s ordination plots

April 11, 2012
By

(This article was first published on From the bottom of the heap » R, and kindly contributed to R-bloggers)

As a developer on the vegan package for R, one of the most FAQs is how to customise ordination diagrams, usually to colour the sample points according to an external grouping variable. Now, just because we get asked how to do this a lot is not really a reflection on the quality of the plot() methods available in vegan.

Ordination diagrams are difficult beasts to handle with plotting code without having an unwieldy number of arguments etc. There are potentially five sets of scores that need to be plotted so the number of arguments could quickly get out of hand if we allowed the user to pass all the relevant graphical parameters to the various sets of scores. We’ve provided a basic plot() method that is set up with useful defaults that works for all the ordination methods in vegan. This method is really there to allow the quick visualisation of the fitted ordination; I use vegan on almost a daily basis and none of my presentation or publication figures use the default plot method at all. However, we have provided all the tools needed to draw custom ordination plots within vegan. How you use them also provides a useful guide to building up base graphics plots from lower-level plotting functions. In this post I intend to show two examples of building up a simple PCA biplot from the basic building blocks available in vegan and R’s base graphics.

To get going, start R and load the vegan package. For this example I will use the classic Dutch Dune Meadow data set which I also load. A simple PCA of the species data is then fitted and stored in mod. To finish the basic example, I use the basic plot() method to plot the PCA biplot. Note that I’m using symmetric scaling here; I tend to prefer this scaling for general diagrams as it preserves many of the features of biplots without focusing on one or other sets of scores. (Here I’m also using it to illustrate how to select a scaling when you are building the plot from scratch.)

## load vegan
require("vegan")

## load the Dune data
data(dune, dune.env)

## PCA of the Dune data
mod <- rda(dune, scale = TRUE)

## plot the PCA
plot(mod, scaling = 3)

The resulting PCA biplot is shown below

Basic plot() method for ordinations in vegan

The result of a call to the plot.cca() method results in a simple biplot of the Dune Meadows PCA but is not very customisable.

Building a biplot using vegan methods

The first example of a customised biplot I will show uses low-level plotting methods provided by vegan. These include points() and text() methods for objects of class "cca". (The "cca" object is one of the base ordination classes in vegan; its name is a bit unfortunate as it is the base representation for PCA, CA, RDA etc… objects — read more about this object via ?cca.object.) I am going to plot the basic PCA biplot, but colour the sites according to the land-use variable dune.env$Use, which has three levels

> with(dune.env, levels(Use))
[1] "Hayfield" "Haypastu" "Pasture"

Start by defining an object holding the desired ordination scaling and a vector of colours with which to do the plotting

scl <- 3 ## scaling = 3
colvec <- c("red2", "green4", "mediumblue")

The basic plot() method can be coerced into setting up the basic plot axes, limits etc for us by using type = "n", so we use that short cut and pass along our desired scaling

plot(mod, type = "n", scaling = scl)

The next step is to add the site scores. Here I use the points() method rather than draw the site scores using their sample ID (as is the default in the plot() method).

with(dune.env, points(mod, display = "sites", col = colvec[Use],
                      scaling = scl, pch = 21, bg = colvec[Use]))

The key point to note in the code chunk above is how I colour each site according to its land-use. I index into the vector of colours created earlier using the factor Use. Use is essentially a vector of 1s, 2s, and 3s (there are three levels remember). The colvec[Use] evaluates to a vector the same length as the number of sites, where each element is one of the pre-specified colours

 > head(with(dune.env, colvec[Use]))
[1] "green4"     "green4"     "green4"     "mediumblue"
[5] "green4"     "green4"

The display argument selects the type of scores to plot. The remainder of the arguments are the scaling for the scores (so they match the base plot) and arguments to style the plotted points.

Next I add the species scores, but this time I want to label them with (abbreviated) species names. For this I use the text() method with argument display = "species"

text(mod, display = "species", scaling = scl, cex = 0.8, col = "darkcyan")

To finish the plot I add a legend. It is important to get the ordering and labelling of the colours correct here. When I drew the site scores I used the Use factor to index the vector of plotting colours. To ensure we get the correct ordering in the legend, it is best to extract the levels as the lables, which is what I do below

with(dune.env, legend("topright", legend = levels(Use), bty = "n",
                      col = colvec, pch = 21, pt.bg = colvec))

If you want to provide custom labels, look at the levels of the factor and provide them to legend() in the correct order.

The biplot should now look like the one below

Customised ordination built from vegan lower-level plot methods

A customised ordination built from vegan lower-level plot methods

Building a biplot using base graphics directly

If you want the ultimate level of control over the plots then you will want to build the plot up from scratch using lower-level plotting functions provided by base graphics. In this second example I’ll recreate the plot I made above but from the ground up using basic plotting functions.

First, start by extracting the scores needed from the ordination object. Here the scaling required for the plot needs to be provided and the sets of scores specified

scrs <- scores(mod, display = c("sites", "species"), scaling = scl)

This results in a list with two components; sites and species

str(scrs, max = 1)

Each component is a matrix with two columns containing the scores on the first and second principal components respectively. If you want to extract scores on different axes provide the choices argument to the scores() function with a numeric vector of the axes you wish to extract. Do note that the code below assumes only two axes are extracted.

Next compute the axis limits, which need to cover the range of the site and species scores on the first (x-axis) and second (y-axis) principal components

xlim <- with(scrs, range(species[,1], sites[,1]))
ylim <- with(scrs, range(species[,2], sites[,2]))

Now everything is ready to do some actual plotting. Start preparing the plot device be starting a new plot with plot.new() and then set up the coordination system via a call to plot.window() supplying the axis limits created above. A crucial aspect of the call to plot.window() is the graphical parameter asp which controls the aspect ratio of the plot. Here we set the aspect ratio equal to 1 to preserve the relationships between scores on the different axes and the distance interpretation of the biplot

plot.new()
plot.window(xlim = xlim, ylim = ylim, asp = 1)

The reference guides (dotted lines going through the point (0,0) are added first so as to not lie on top of any of the site or species scores. Two calls to the abline() function are used

abline(h = 0, lty = "dotted")
abline(v = 0, lty = "dotted")

The next two lines of code use the default methods for points() and text() rather than the "cca" methods used above

with(dune.env, points(scrs$sites, col = colvec[Use],
                      pch = 21, bg = colvec[Use]))
with(scrs, text(species, labels = rownames(species),
                col = "darkcyan", cex = 0.8))

The legend is added in same way as before

with(dune.env, legend("topright", legend = levels(Use), bty = "n",
                      col = colvec, pch = 21, pt.bg = colvec))

All that remains is to add the plot furniture; the axes, axis labels and the plot frame

axis(side = 1)
axis(side = 2)
title(xlab = "PC 1", ylab = "PC 2")
box()

The fruits of our labours are shown below

PCA ordination built up directly from base graphics elements

The customised ordination built directly from base graphics elements

I don’t claim these are perfect; many of the labels lie on top of one another for example. Vegan has some functions to help with this but I’ll leave exemplifying those for another post.

The full code for the two examples is shown below

## load vegan
require("vegan")

## load the Dune data
data(dune, dune.env)

## PCA of the Dune data
mod <- rda(dune, scale = TRUE)

## plot the PCA
plot(mod, scaling = 3)

## build the plot up via vegan methods
scl <- 3 ## scaling == 3
colvec <- c("red2", "green4", "mediumblue")
plot(mod, type = "n", scaling = scl)
with(dune.env, points(mod, display = "sites", col = colvec[Use],
                      scaling = scl, pch = 21, bg = colvec[Use]))
text(mod, display = "species", scaling = scl, cex = 0.8, col = "darkcyan")
with(dune.env, legend("topright", legend = levels(Use), bty = "n",
                      col = colvec, pch = 21, pt.bg = colvec))

## or via base graphics methods
scrs <- scores(mod, display = c("sites", "species"), scaling = scl)
str(scrs, max = 1)

xlim <- with(scrs, range(species[,1], sites[,1]))
ylim <- with(scrs, range(species[,2], sites[,2]))

plot.new()
plot.window(xlim = xlim, ylim = ylim, asp = 1)
abline(h = 0, lty = "dotted")
abline(v = 0, lty = "dotted")
with(dune.env, points(scrs$sites, col = colvec[Use],
                      pch = 21, bg = colvec[Use]))
with(scrs, text(species, labels = rownames(species),
                col = "darkcyan", cex = 0.8))
axis(1)
axis(2)
title(xlab = "PC 1", ylab = "PC 2")
with(dune.env, legend("topright", legend = levels(Use), bty = "n",
                      col = colvec, pch = 21, pt.bg = colvec))
box()

To leave a comment for the author, please follow the link and comment on his blog: From the bottom of the heap » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.