Using R: Animal model with simulated data

October 13, 2019
By

[This article was first published on R – On unicorns and genes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week’s post just happened to use MCMCglmm as an example of an R package that can get confused by tibble-style data frames. To make that example, I simulated some pedigree and trait data. Just for fun, let’s look at the simulation code, and use MCMCglmm and AnimalINLA to get heritability estimates.

First, here is some AlphaSimR code that creates a small random mating population, and collects trait and pedigree:

library(AlphaSimR)

## Founder population
FOUNDERPOP <- runMacs(nInd = 100,
                      nChr = 20,
                      inbred = FALSE,
                      species = "GENERIC")

## Simulation parameters 
SIMPARAM <- SimParam$new(FOUNDERPOP)
SIMPARAM$addTraitA(nQtlPerChr = 100,
                   mean = 100,
                   var = 10)
SIMPARAM$setGender("yes_sys")
SIMPARAM$setVarE(h2 = 0.3)
 
## Random mating for 9 more generations
generations <- vector(mode = "list", length = 10) 
generations[[1]] <- newPop(FOUNDERPOP,
                           simParam = SIMPARAM)


for (gen in 2:10) {

    generations[[gen]] <- randCross(generations[[gen - 1]],
                                    nCrosses = 10,
                                    nProgeny = 10,
                                    simParam = SIMPARAM)

}

## Put them all together
combined <- Reduce(c, generations)


## Extract phentoypes
pheno <- data.frame(animal = [email protected],
                    pheno = [email protected][,1])

## Extract pedigree
ped <- data.frame(id = [email protected],
                  dam = [email protected],
                  sire [email protected])
ped$dam[ped$dam == 0] <- NA
ped$sire[ped$sire == 0] <- NA

## Write out the files
write.csv(pheno,
          file = "sim_pheno.csv",
          row.names = FALSE,
          quote = FALSE)

write.csv(ped,
          file = "sim_ped.csv",
          row.names = FALSE,
          quote = FALSE)

In turn, we:

  1. Set up a founder population with a AlphaSimR’s generic livestock-like population history, and 20 chromosomes.
  2. Choose simulation parameters: we have an organism with separate sexes, a quantitative trait with an additive polygenic architecture, and we want an environmental variance to give us a heritability of 0.3.
  3. We store away the founders as the first generation, then run a loop to give us nine additional generations of random mating.
  4. Combine the resulting generations into one population.
  5. Extract phenotypes and pedigree into their own data frames.
  6. Optionally, save the latter data frames to files (for the last post).

Now that we have some data, we can fit a quantitative genetic pedigree model (”animal model”) to estimate genetic parameters. We’re going to try two methods to fit it: Markov Chain Monte Carlo and (the unfortunately named) Integrated Nested Laplace Approximation. MCMC explores the posterior distribution by sampling; I’m not sure where I heard it described as ”exploring a mountain by random teleportation”. INLA makes approximations to the posterior that can be integrated numerically; I guess it’s more like building a sculpture of the mountain.

First, a Gaussian animal model in MCMCglmm:

library(MCMCglmm)

## Gamma priors for variances
prior_gamma <- list(R = list(V = 1, nu = 1),
                    G = list(G1 = list(V = 1, nu = 1)))
    
## Fit the model
model_mcmc  <- MCMCglmm(scaled ~ 1,
                        random = ~ animal,
                        family = "gaussian",
                        prior = prior_gamma,
                        pedigree = ped,
                        data = pheno,
                        nitt = 100000,
                        burnin = 10000,
                        thin = 10)

## Calculate heritability for heritability from variance components
h2_mcmc_object  <- model_mcmc$VCV[, "animal"] /
    (model_mcmc$VCV[, "animal"] + model_mcmc$VCV[, "units"])
 
## Summarise results from that posterior
h2_mcmc  <- data.frame(mean = mean(h2_mcmc_object),
                       lower = quantile(h2_mcmc_object, 0.025),
                       upper = quantile(h2_mcmc_object, 0.975),
                       method = "MCMC",
                       stringsAsFactors = FALSE)

And here is a similar animal model in AnimalINLA:

library(AnimalINLA)

## Format pedigree to AnimalINLA's tastes
ped_inla <- ped
ped_inla$id  <- as.numeric(ped_inla$id)
ped_inla$dam  <- as.numeric(ped_inla$dam)
ped_inla$dam[is.na(ped_inla$dam)] <- 0
ped_inla$sire  <- as.numeric(ped_inla$sire)
ped_inla$sire[is.na(ped_inla$sire)] <- 0
    
## Turn to relationship matrix
A_inv <- compute.Ainverse(ped_inla)
    
## Fit the model
model_inla  <- animal.inla(response = scaled,
                           genetic = "animal",
                           Ainverse = A_inv,
                           type.data = "gaussian",
                           data = pheno,
                           verbose = TRUE)

## Pull out summaries from the model object
summary_inla  <- summary(model_inla)

## Summarise results
h2_inla  <- data.frame(mean = summary_inla$summary.hyperparam["Heritability", "mean"],
                       lower = summary_inla$summary.hyperparam["Heritability", "0.025quant"],
                       upper = summary_inla$summary.hyperparam["Heritability", "0.975quant"],
                       method = "INLA",
                       stringsAsFactors = FALSE)

If we wrap this all in a loop, we can see how the estimation methods do on replicate data (full script on GitHub). Here are estimates and intervals from ten replicates (black dots show the actual heritability in the first generation):

As you can see, the MCMC and INLA estimates agree pretty well and mostly hit the mark. In the one replicate dataset where they falter, they falter together.

To leave a comment for the author, please follow the link and comment on their blog: R – On unicorns and genes.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)