Introducing fishtree and fishtreeoflife.org

[This article was first published on Jonathan Chang, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In our recent publication (Rabosky et al. 2018) we assembled a huge phylogeny of ray-finned fishes: the most comprehensive to date! While all of our data are accessible via Dryad, we felt like we could go the extra mile to make it easy to repurpose and reuse our work. I’m pleased to report that this effort has resulted in two resources for the community: the Fish Tree of Life website, and the fishtree R package. The package is available on CRAN now, and you can install it with:

install.packages("fishtree")

The source is on Github in the repository jonchang/fishtree. The manuscript describing these resources has been published in Methods in Ecology and Evolution (Chang et al. 2019).

Figure S1 from our manuscript showing areas on the fish tree of life of long branch attraction

Website: fishtreeoflife.org

The Fish Tree of Life website is intended to serve as a quick resource for when you need to look up information about ray-finned fishes. There are two primary types of pages on the website: taxonomy pages and fossil pages.

A portion of the backbone phylogeny leading to the taxonomy pages.

For example, the taxonomy page for Acanthuridae, the surgeonfishes, indicates that our phylogeny sampled most of the species in this family, and that one fossil calibration was used to date this group. You’ll also see that all associated taxonomic ranks (both more inclusive and less inclusive, if applicable) are listed.

The download links lead you to subsetted versions of the phylogeny and character matrix constructed for this group. If you’re only interested in the surgeonfishes, you don’t have to download the entire phylogeny to get what you’re interested in.

The fossil section links to a single species, Proacanthurus tenuis, that was used to calibrate the crown age of Acanthuridae. Fossil pages will all list what taxon they calibrate, as well as the minimum age that fossil informs, the computed maximum age, the placement authority reference, the age authority reference, and the fossil locality.

The computed maximum age is based on the WHETA algorithm and the outgroup sequence listed. If you’re interested in the details, consult the Methods § Fossil Calibration page.

One thing that we’re especially proud of is that the website is completely static and nearly all text, and loads lightning fast. This is to support researchers working in areas where fast Internet is not available. Our only concession to vanity is on the home page, derived from Figure 3 of the Nature manuscript. The fishes were illustrated by Julie Johnson; clicking on the different fish will take you to the appropriate taxon page.

R package: fishtree

In addition to the website, we’ve also developed an R package that interfaces with the underlying data.

library(fishtree)
library(ape)
phy <- fishtree_phylogeny(rank = “Acanthuridae”)
phy
#> 
#> Phylogenetic tree with 67 tips and 66 internal nodes.
#> 
#> Tip labels:
#>  Acanthurus_mata, Acanthurus_blochii, Acanthurus_xanthopterus, Acanthurus_bariene, Acanthurus_dussumieri, Acanthurus_leucocheilus, …
#> 
#> Rooted; includes branch lengths.
par(mfrow=c(2, 1))
plot(phy, show.tip.label = FALSE)
ltt.plot(phy)

A phylogeny of Acanthuridae with a lineage-through-time plot.

The R package permits easy access to downloads of the phylogeny, sequence alignments, and taxonomic information for the ray-finned fishes. Not only are the pre-computed per-taxon subsets available, but the relevant functions also accept a list of species and will subset the larger dataset to return a sequence matrix or phylogeny including only those species.

Our intent with the R package is to enable more complex analyses with this broad dataset. One example that we showcase in our manuscript reanalyzes portions of the fish tree of life with RAxML and CONSEL and other programs to search for areas that might have been affected by long branch attraction. The code and data for this analysis are available on Dryad.

Two other analyses are presented in the supplement of the manuscript; a more accessible web version is available in the Vignettes section on CRAN. These cover a comparative analysis that replicates a previous experiment in the tetradontid fishes (Santini et al. 2013), and a community phylogenetics analysis looking at community structure in reef-associated fishes.

You may have noticed that each taxonomy page on the website links to a JSON API file. The fishtree package consumes these JSON files under the hood; you are welcome to use these directly, but we can’t guarantee that they won’t change in the future.

The future

There’s still plenty to work on for both the R package and the website, as well as ray-finned fish phylogenetics in general. Our plans for the R package are to extend its functionality to include the fossil data we have on the website.

The website is essentially feature-complete: there’s a lot of polish that could be done, and certainly other data sources we could incorporate, or come up with new ways to slice the data for easy consumption. The most important feature on our list is the ability to switch between alternate topologies; for example, how would the taxon pages look if we used older fish phylogenies (e.g., Rabosky et al. 2013) instead?

These and other features will have to wait. I have a ton of papers to write and jobs to apply for. My promise to the community is that this website and R package will continue to be maintained as long as I’m involved in scientific research. If you’re interested in helping out, pull requests are welcome on either of the GitHub repositories!

Feedback

If you encounter any problems with the R package, please open an issue on Github. If you spot any bugs with the website, please open an issue on Github for the website. Feature requests are welcome as well, but I can’t guarantee I’ll get around to implementing your suggestions.

Finally, if you spot errors in the phylogeny (e.g., a rogue taxon or something like that), please report it in this Google Form so I can collate these kind of fixes in a future update.

I’m excited to see what kinds of research these resources will enable! If you publish a paper or other analysis with this, please send me an email or tweet at me so I can check it out!

To leave a comment for the author, please follow the link and comment on their blog: Jonathan Chang.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)