Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use generalized linear mixed effect models.
A complete workflow from the initial data massage to model fitting and the output of the results can easily fit one page of an R script. Even if more is needed, it rarely takes more than a second to run.
Now here is a warning: Do not go beyond that if you plan to be happy. Stay with lm() and glm(), and use mixed-effect models only when the referees bully you too much. If the referees of your paper start to use buzzwords such as ‘hierarchical’, ‘prior’ or ‘uncertainty’, withdraw and promptly send the manuscript to another journal. You will publish faster and more, and you will get cited more because more people will actually understand your results.
I was a fool to go beyond that. Initially, I just wanted to understand what mixed effect models REALLY are. Maybe you know the confusion: What the hell is the difference between random and fixed effects? How on earth am I supposed to ever confidently specify a correct mixed effect model structure using the bloody formula representation in lme4 and nlme libraries in R? Why is it so difficult to get expert advice on that? Why are all the relevant textbooks written for statisticians and not for humans? How should I interpret the results?
I ended up crunching through book after book. And I found myself studying probability distributions, figuring out what exactly their role is in statistical models. I had understood the difference between data and models, I had finally got a grip on likelihood, and I realized that there are at least two definitions of probability itself.
At a certain moment I had a satori. There was a little click and it all unfolded backwards. And I started to see things with new eyes. It was like being on a trip. I saw t-test, I saw ANOVA, I saw linear regression, Poisson log-linear regression, logistic regression, and finally I saw what some call mixed-effect models. But I saw them as arbitrary and modifiable categories, building blocks, templates. They were all just probability distributions connected by algebra.
The world of my ideas started to beg for an unrestricted intercourse with the world of the data on my screen. I felt like I was liberated and able to translate any hypothesis into an exact formal representation, and that these representations can be properly parameterized and fit to the data because there has been MCMC.
Once I was able to see probability as the intensity of belief I no longer saw any controversy in the usage of informative priors. So straightforward! P-values, test statistics, null hypotheses, randomization tests, standard errors and bootstrap ended up in garbage – not for being incorrect, they just seemed too ad hoc. Machine learning, neural networks, regression trees, random forests, copulas and data mining seemed like primitive (and dangerous) black boxes used by those who had not seen the light yet.
It was the time of the joy of boundless possibilities.
This was all several manuscripts, four conferences, heaps of papers and thousands of lines of code ago. Almost two years of reality ago.
Recently, I have realized that:
I spend days massaging the data so that they fit the OpenBUGS or JAGS requirements. I spend weeks to months thinking about how exactly I should implement my models. I spend days to weeks trying to make OpenBUGS or JAGS run without crashes – JAGS at least gives me a hint what went wrong, OpenBUGS is almost impossible to debug. It costs me more effort to explain my models and results to people. In manuscripts, where I used “I fitted a generalized linear model (poisson family, log link) to the data using glm() in R“, I now have a page of description of model structure, an extra paragraph describing how my chains converged, and I plague my manuscripts with equations. Even if the model is not that complex, it puts readers off, including my co-authors. Referees who never used latent variables and hierarchical models have a hard time seeing through it. I have to spend a lot of energy and time explaining my methods in responses to referees. I am generally more defensive in my responses. Even a simple re-run or correction of my analyses can take days or weeks. As a consequence, the publishing process is slower, dissemination of my results is less effective, and I expect to be less cited. Oh, and the usage of informative priors seems suspicious to almost everybody.
But don’t get me wrong. I still love it! The joy of seeing my little posterior distributions popping out is enormous. I still think that it is all worth it: it is the price that I pay for having all exactly defined (hopefully) and transparent (hopefully), and with the uncertainty properly quantified (hopefully). And since I have always been an idealist, it comforts me that I have at least ideological superiority. Pragmatically and practically speaking, it has been a martyrdom.
I guess that all emerging technologies and paradigms are like that. The starts are clumsy and bumpy. When computers appeared they were one big pain to work with: you had to translate the code into holes in a punched card, you mailed (not e-mailed!) it to a computing center, and in several weeks you would receive an envelope with a punched card that represented the results (perhaps a least-squares solution of a simple linear regression). Imagine debugging that! And you know what computers are today. With Bayesian hierarchical modelling it seems similar. STAN is a promising next step. I believe that the martyrdom is only temporary, and that it will pay off in the long run.