R for Ecologists: Phylogenies in R

[This article was first published on Climate Change Ecology » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve only recently begun working from an evolutionary perspective, and I can’t imagine why I haven’t thought about it much before. After all, it comes up in just about everything that we do in ecology. For example, I’m currently feeding an herbivore many different plant species. It is likely that the herbivore feeds on plants depending on 1) plant traits which are likely phylogenetically dependent and 2) coevolutionary history with the plant. As a result, it’s become imperative for me to start thinking about phylogenetic relationships among plants and, less frequently, herbivores.

I think phylogenetic trees are an easy way for me to visualize relationships among plant species. However, building phylogenetic trees is no easy task. Fortunately, R can do just about everything, when asked. Here I detail the process of building a plant phylogeny with R.

Let’s assume I want to build a phylogeny for Rosa multiflora (invasive), Rubus allegheniensis (native), Rubus phoenicolasius (invasive), Acer rubrum (native), and Vitis vulpina (native). Here’s how it’s done:

Step 1: Get the Phylogeny

The hard work here has already been done for us. Phylomatic is an online tool that allows us to enter species and build an entire phylogeny based on the APG III framework. Go to the page and you’ll see a box. The instructions are pretty self explanatory, so we’ll put into the box:

rosaceae/rosa/rosa_multiflora
rosaceae/rubus/rubus_allegheniensis
rosaceae/rubus/rubus_phoenicolasius
sapindaceae/acer/acer_rubrum
vitaceae/vitis/vitis_vulpina

I like the Newick output format. You can look up details elsewhere, but its essentially a text map of the phylogeny. Once you submit, you’ll get a string of text:

((((rosa_multiflora,(rubus_allegheniensis,rubus_phoenicolasius)rubus)rosaceae,acer_rubrum),vitis_vulpin)rosids)euphyllophyte;

Paste that into your favorite text editor and save it as a plain text file (I called mine newick_phylo.txt). R is funky, it doesn’t like some of the parts of that. Trim off the very first parenthesis, and everything to the right of and including the last parenthesis but LEAVE the semicolon (in this case, delete ‘)euphyllophyte’). You have no idea how long that took me to figure out. The final text looks like:

(((rosa_multiflora,(rubus_allegheniensis,rubus_phoenicolasius)rubus)rosaceae,acer_rubrum),vitis_vulpin)rosids;

Step 2: Work it with R

Now, load up R and call the ‘ape‘ package. Use the read.tree( ) command to read in the phylogeny. Then plot the phylogeny file.

rosa.tree <- read.tree('newick_phylo.txt')
plot(rosa.tree)

Voilà! Your phylogeny should look like this:

Basic phylogeny

** Note that the branch lengths assume constant age of divergence. We can fix that if we have estimates of divergence times for each node (i.e. branch occurrence) and run the phylogeny through Phylocom using the bladj command. I leave that one up to you. I recommend starting with Wikstrom et al. (2001) as a starting point for estimating divergence times.

But wait? None of the species names are capitalized? Easy fix! Load the ‘Hmisc‘ package that has a nifty capitalize( ) command and run the following:

tree$tip.label <- capitalize(tree$tip.label)
plot(tree)

Now we want to make the branches for invasive species red. This is trickier. Each ‘edge’ has a vector associated with it, so we can assign colors to all the edges. Finding out which edge is which is the hard part. I still haven’t figured it out just yet on a reliable basis, but it seems to order the edges going down and then going back up:

It looks easy, but try figuring it out on a larger tree.

We want edges 3 and 6 to be red, so the easiest way is to create 1×8 vector of black colors and then rename entries 3 and 6 to red:

edgecol <- rep('black', 8)
edgecol[c(3,6)] <- 'red'
plot(tree, edge.color=edgecol)

I also don’t like that the tip labels are so close to the edges, almost overlapping. We can add some space in using label.offset. Note that the value of label.offset will change for every phylogeny depending on whether you have ages/distances for your edges.

plot(tree, edge.color=edgecol, label.offset = 0.1)

The not-so-real deal.

And there’s your (basic) phylogeny! If you’ve run your Newick tree through bladj with ages for the nodes, you’ll get a nice phylogeny with variable branch lengths depending on the age of split. Also, be sure to check your species names. Notice that for some reason, Phylomatic output Vitis vulpin instead of Vitis vulpina. You can fix that at any step, in the newick text file or in R, but be sure to check your species names. Now, running a phylogenetic analysis like PGLS is a whole different story…


To leave a comment for the author, please follow the link and comment on their blog: Climate Change Ecology » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)