INLA: Bayes goes to Norway
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
INLA is not the Norwegian answer to ABBA; that would probably be a-ha. INLA is the answer to ‘Why do I have enough time to cook a three-course meal while running MCMC analyses?”.
Integrated Nested Laplace Approximations (INLA) is based on direct numerical integration (rather than simulation as in MCMC) which, according to people ‘in the know’, allows:
- the estimation of marginal posteriors for all parameters,
- marginal posteriors for each random effect and
- estimation of the posterior for linear combinations of random effects.
Rather than going to the usual univariate randomized complete block or split-plot designs that I have analyzed before (here using REML and here using MCMC), I’ll go for some analyses that motivated me to look for INLA. I was having a look at some reproductive output for Drosophila data here at the university, and wanted to fit a logistic model using MCMCglmm. Unfortunately, I was running into the millions (~3M) of iterations to get a good idea of the posterior and, therefore, leaving the computer running overnight. Almost by accident I came across INLA and started playing with it. The idea is that Sol—a Ph.D. student—had a cool experiment with a bunch of flies using different mating strategies over several generations, to check the effect on breeding success. Therefore we have to keep track of the pedigree too.
# Set working directory containing data and code setwd('~/Dropbox/research/2010/flies') # Packages needed for analysis # This code requires the latest (and updated) version of INLA require(INLA) # This loads INLA require(pedigreemm) # pedigree(), relfactor(), Tinv, D, ... ####### Pedigree and assessment files # Reads pedigree file ped <- read.csv('recodedped.csv', header=FALSE) names(ped) <- c('id', 'mum', 'dad') # Reads data file dat <- read.csv('ggdatall.csv', header=TRUE) dat$cross <- factor(dat$cross) # Pedigree object for pedigreemm functions pedPMM <- with(ped, pedigreemm::pedigree(sire=dad, dam=mum, label=id)) # Pedigree precision matrix (A inverse) # T^{-1} in A^{-1} = (T^{-1})' D^{-1} T^{-1} Tinv <- as(pedPMM, "sparseMatrix") D <- Diagonal(x=Dmat(pedPMM)) # D in A = TDT' Dinv <- solve(D) Ainv <- t(Tinv) %*% Dinv %*% Tinv
Up to this point we have read the response data, the pedigree and constructed the inverse of the pedigree matrix. We also needed to build a contrast matrix to compare the mean response between the different mating strategies. I was struggling there and contacted Gregor Gorjanc, who kindly emailed me the proper way to do it.
# Define contrasts to compare cross types. Thanks to Gregor Gorjanc # for coding contrasts k <- nlevels(dat$cross) tmp <- matrix(nrow=(k-1)*k/2, ncol=k) ## 1 2 3 4 5 6 tmp[ 1, ] <- c( 1, -1, NA, NA, NA, NA) ## c1-c2 tmp[ 2, ] <- c( 1, NA, -1, NA, NA, NA) ## -c3 tmp[ 3, ] <- c( 1, NA, NA, -1, NA, NA) ## -c4 tmp[ 4, ] <- c( 1, NA, NA, NA, -1, NA) ## -c5 tmp[ 5, ] <- c( 1, NA, NA, NA, NA, -1) ## -c6 tmp[ 6, ] <- c( NA, 1, -1, NA, NA, NA) ## c2-c3 tmp[ 7, ] <- c( NA, 1, NA, -1, NA, NA) ## -c4 tmp[ 8, ] <- c( NA, 1, NA, NA, -1, NA) ## -c5 tmp[ 9, ] <- c( NA, 1, NA, NA, NA, -1) ## -c6 tmp[10, ] <- c( NA, NA, 1, -1, NA, NA) ## c3-c4 tmp[11, ] <- c( NA, NA, 1, NA, -1, NA) ## -c5 tmp[12, ] <- c( NA, NA, 1, NA, NA, -1) ## -c6 tmp[13, ] <- c( NA, NA, NA, 1, -1, NA) ## c4-c5 tmp[14, ] <- c( NA, NA, NA, 1, NA, -1) ## -c6 tmp[15, ] <- c( NA, NA, NA, NA, 1, -1) ## c5-c6 # Make Linear Combinations LC <- inla.make.lincombs(cross=tmp) # Assign names to combinations t <- 0 for(i in 1:(k-1)) { for(j in (i+1):k) { t <- t + 1 names(LC)[t] <- paste("c", i, "-", "c", j, sep="") } }
There is another related package (Animal INLA) that takes care of i- giving details about the priors and ii- “easily” fitting models that include a term with a pedigree (an animal model in quantitative genetics speak). However, I wanted the assumptions to be clear so read the source of Animal INLA and shamelessly copied the useful bits (read the source, Luke!).
###### Analysis for for binomial traits ####### Plain-vanilla INLA Version # Feeling more comfortable with *explicit* statement of assumptions # (rather than hidden behind animal.inla()) # Function to backconvert logits to probabilities back.conv <- function(values){ return(1/(1+exp(-(values)))) } # Function to get posterior of the odds # Thanks to Helena Moltchanova inla.marginal.summary <- function(x){ m1 <- inla.emarginal(function(z) exp(z), marginal=x) odds <- inla.marginal.transform(function(x) exp(x), x) q <- inla.qmarginal(p=c(0.025, 0.975), marginal=odds) c("0.025quant"=q[1], "0.5quant"=m1, "0.975quant"=q[2]) } # Model for pupae/eggs # Drops a few observations with no reproductive output (trips INLA) no0eggs <- subset(dat, eggs>0 & pupae <= eggs) # Actual model mpueg <- pupae ~ f(cross, model='iid', constr=TRUE, hyper=list(theta=list(initial=-10, fixed=TRUE))) + f(id, model='generic0', constr=TRUE, Cmatrix=Ainv, hyper=list(theta=list(param=c(0.5,0.5), fixed=FALSE))) # INLA call fpueg <- inla(formula=mpueg, family='binomial', data=no0eggs, lincomb=LC, Ntrials=eggs, control.compute=list(dic=FALSE)) # Results summary(fpueg) # Call: # c("inla(formula = mpueg, family = \"binomial\", data = no0eggs, # Ntrials = eggs, ", " lincomb = LC, control.compute = list(dic = FALSE))") # # Time used: # Pre-processing Running inla Post-processing Total # 0.2712612 1.1172159 2.0439510 3.4324281 # # Fixed effects: # mean sd 0.025quant 0.5quant 0.975quant kld # (Intercept) 1.772438 0.1830827 1.417413 1.770863 2.136389 0.5833235 # # Linear combinations (derived): # ID mean sd 0.025quant 0.5quant 0.975quant kld # c1-c2 0 -0.26653572 0.7066540 -1.6558225 -0.26573011 1.11859967 0 # c1-c3 1 0.04150999 0.7554753 -1.4401435 0.04104020 1.52622856 0 # c1-c4 2 -0.08777325 0.6450669 -1.3557501 -0.08713005 1.17693349 0 # c1-c5 3 -1.36702960 0.6583121 -2.6615604 -1.36618274 -0.07690788 0 # c1-c6 4 -1.82037714 0.8193280 -3.4338294 -1.81848244 -0.21714431 0 # c2-c3 5 0.30804735 0.7826815 -1.2248185 0.30677279 1.84852340 0 # c2-c4 6 0.17876229 0.5321948 -0.8654273 0.17859036 1.22421409 0 # c2-c5 7 -1.10049385 0.7466979 -2.5663142 -1.10046590 0.36558211 0 # c2-c6 8 -1.55383673 0.8188321 -3.1640965 -1.55276603 0.05084282 0 # c3-c4 9 -0.12928419 0.7475196 -1.5996080 -0.12817855 1.33522000 0 # c3-c5 10 -1.40854298 0.6016539 -2.5930656 -1.40723901 -0.23103707 0 # c3-c6 11 -1.86189314 0.8595760 -3.5555571 -1.85954031 -0.18100418 0 # c4-c5 12 -1.27925604 0.6998640 -2.6536362 -1.27905616 0.09438701 0 # c4-c6 13 -1.73259977 0.7764105 -3.2600936 -1.73134961 -0.21171790 0 # c5-c6 14 -0.45334267 0.8179794 -2.0618730 -0.45229981 1.14976690 0 # # Random effects: # Name Model Max KLD # cross IID model # id Generic0 model # # Model hyperparameters: # mean sd 0.025quant 0.5quant 0.975quant # Precision for id 0.08308 0.01076 0.06381 0.08244 0.10604 # # Expected number of effective parameters(std dev): 223.95(0.7513) # Number of equivalent replicates : 1.121 # # Marginal Likelihood: -1427.59 fpueg$summary.random$cross # ID mean sd 0.025quant 0.5quant 0.975quant kld # 1 1 -0.5843466 0.4536668 -1.47561024 -0.5840804 0.3056632 0.0178780930 # 2 2 -0.3178102 0.4595676 -1.21808638 -0.3184925 0.5865565 0.0009666916 # 3 3 -0.6258600 0.4978254 -1.60536281 -0.6250077 0.3491075 0.0247426578 # 4 4 -0.4965763 0.4071715 -1.29571071 -0.4966277 0.3030747 0.0008791629 # 5 5 0.7826817 0.4389003 -0.07756805 0.7821937 1.6459253 0.0077476806 # 6 6 1.2360387 0.5768462 0.10897529 1.2340813 2.3744368 0.0451357379 # Backtransforms point estimates and credible intervals for odds -> prob for(name in names(fpueg$marginals.lincomb.derived)){ summa = inla.marginal.summary(eval(parse(text=paste("fpueg$marginals.lincomb.derived$\'", name, "\'", sep='')))) cat(name, summa, '\n') } # c1-c2 0.1894451 0.9831839 3.019878 # c1-c3 0.2338952 1.387551 4.534581 # c1-c4 0.256858 1.127751 3.204961 # c1-c5 0.0695406 0.3164847 0.9145132 # c1-c6 0.03157478 0.2264027 0.792517 # c2-c3 0.289088 1.850719 6.255175 # c2-c4 0.4213069 1.377848 3.366947 # c2-c5 0.0759222 0.4398384 1.420934 # c2-c6 0.04135211 0.2955985 1.035951 # c3-c4 0.1996085 1.16168 3.747526 # c3-c5 0.0746894 0.2929174 0.7847903 # c3-c6 0.02774805 0.2245797 0.821099 # c4-c5 0.06988459 0.355529 1.084414 # c4-c6 0.03780307 0.2389529 0.7974092 # c5-c6 0.1245211 0.8878682 3.108852
A quick look at the time taken by INLA shows that it is in the order of seconds (versus overnight using MCMC). I have tried a few examples and the MCMCglmm and INLA results tend to be very close; however, figuring out how to code models has been very tricky for me. INLA follows the glorious tradition of not having a ‘proper’ manual, but a number of examples with code. In fact, they reimplement BUGS‘s examples. Personally, I struggle with that approach towards documentation, but you may be the right type of person for that. Note for letter to Santa: real documentation for INLA.
I was talking with a student about using Norwegian software and he mentioned Norwegian Black Metal. That got me thinking about how the developers of the package would look like; would they look like Gaahl of Gorgoroth (see interview here)?
Talk about disappointment! In fact Håvard Rue, INLA mastermind, looks like a nice, clean, non-black-metal statistician. To be fair, it would be quite hard to code in any language wearing those spikes…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.