Using R: When weird errors occur in packages that used to work, check that you’re not feeding them a tibble

[This article was first published on R – On unicorns and genes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are some things that are great about the tidyverse family of R packages and the style they encourage. There are also a few gotchas. Here’s a reminder to myself about this phenomenon: tidyverse-style data frames (”tibbles”) do not simplify to vectors upon extracting a single column with hard bracket indexing.

Because some packages rely on specific data.frame behaviours that tibbles don’t show, functions that work nicely with data frames, and normally have nice interpretable error messages, may mysteriously collapse in all kinds of ways when fed a tibble.

Here’s an example with MCMCglmm. This is not to pick on MCMCglmm; it just happened to be one of the handful of packages where I’ve run into this issue. Here, we use readr, the tidyverse alternative to the read.table family of functions to read some simulated data. The base function is called read.csv, and the readr alternative is read_csv.

Reading in tabular data is a surprisingly hard problem: tables can be formatted in any variety of obnoxious ways, and the reading function also needs to be fast enough to deal with large files. Using readr certainly isn’t always painless, but it reduces the friction a lot compared to read.table. One of the improvements is that read_csv will return a data.frame with the class tbl_df, affectionately called ”tibble

After reading the data, we centre and scale the trait, set up some priors and run an animal model. Unfortunately, MCMCglmm will choke on the tibble, and deliver a confusing error message.

library(MCMCglmm)
library(readr)

ped <- read_csv("sim_ped.csv")
pheno <- read_csv("sim_pheno.csv")

pheno$scaled <- scale(pheno$pheno)

prior_gamma <- list(R = list(V = 1, nu = 1),
                    G = list(G1 = list(V = 1, nu = 1)))

model <- MCMCglmm(scaled ~ 1,
                  random = ~ animal,
                  family = "gaussian",
                  prior = prior_gamma,
                  pedigree = ped,
                  data = pheno,
                  nitt = 100000,
                  burnin = 10000,
                  thin = 10)



Error in inverseA(pedigree = pedigree, scale = scale, nodes = nodes) : 
  individuals appearing as dams but not in pedigree
In addition: Warning message:
In if (attr(pedigree, "class") == "phylo") { :
  the condition has length > 1 and only the first element will be used

In this pedigree, it is not the case that there are individuals appearing as dams but not listed. If we turn the data and pedigree into vanilla data frames instead, it will work:

ped <- as.data.frame(ped)
pheno <- as.data.frame(pheno)

model <- MCMCglmm(scaled ~ 1,
                  random = ~ animal,
                  family = "gaussian",
                  prior = prior_gamma,
                  pedigree = ped,
                  data = pheno,
                  nitt = 100000,
                  burnin = 10000,
                  thin = 10)



                       MCMC iteration = 0

                       MCMC iteration = 1000

                       MCMC iteration = 2000

To leave a comment for the author, please follow the link and comment on their blog: R – On unicorns and genes.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)