Yesterday morning at 7 am I was outside walking the dog before getting a taxi to go to the airport to catch a plane to travel from Christchurch to Blenheim (now I can breath after reading without a pause). It was raining cats and dogs while I was thinking of a post idea for Quantum Forest; something that I could work on without a computer. Then I remembered that I told Tal Galili that I would ‘mention r-bloggers’ in a future post. Well, Tal, this is it.
I started this blog on 4th October and I thought ‘I could write a few things and see how it goes. I may even get 10 people a day reading this thing’. I reached that number almost immediately and I was thinking that, if I kept going at it, I could eventually reach 50 people. Then I came across R-bloggers and, after some hesitation, I submitted this blog to Tal’s web site. I jumped to over 200 people a day visiting the blog (see below) and even got comments! I repeat: I was writing about mixed models and got comments!
As I point out in the ‘About’ page, this is not an ‘R blog’ but a blog about statistics and data analysis in general, that mostly uses R as a vehicle to express ideas. There will be some Python (my favorite language) and, maybe, other tools; however, the ideas are more important than the syntax. Nevertheless, R has democratized the practice of statistics, as well as facilitated the production of some very interesting visuals. Once thing was clear to me: I did not want to write about ‘data visualization’, which is receiving a lot (I dare to say too much) attention in the R world. Most infographics produce the same reaction on me as choirs and mimes; which is to say, bore me to tears (Apologies if you are a choir-singer mime in your spare time). I want to write a bit about analysis and models and ‘bread and butter’ issues, because I think that they are many times ignored by people chasing the latest smoke and mirrors.
I have to say that I am learning a lot from comments, particularly people suggesting packages that I didn’t even know that existed. I am also learning about spam comments and I wish there is a horrible place where spammers go and suffer for eternity (together with pedophiles, torturers and other not-so-nice seedy characters). I am thankful to Tal mostly for putting the work to create a repository of R blogs; if I have some free time I some times wander around looking for some interesting explanation to a problem that is bugging me.
Most of my professional life I have dealt with, let’s say, ‘agricultural’ statistics; that is, designed experiments, linear mixed models (mostly frequentist, but sometimes Bayesian), often with pedigrees. Many statisticians—particularly in mathematics departments—tend to look down on ‘bread and butter’ work. They seem to forget that experiments and breeding/genetics have been the basis of many theoretical developments and that we deal with heavily multivariate data, longitudinal data, spatial data, models with sometimes hundreds of covariance components or, if you are into genomics, models with tens of thousands of random effects.
There is also the politics of software, where in the R community there is a strong bias against any package that is not free (sensu both speech and beer). At some level I share the ideal of having free access to tools, which make the practice of statistics/data analyses available to a large number of people. At another level, it seems counterproductive to work with substandard tools and to go for lesser models only because ‘package X can’t deal with the model that I would like to fit’. This is just another example of software defining our field, which was (and still is) a common accusation pointed towards SAS. In this blog you will see several references to ASReml-R a commercial mixed models package, which I tend to use because there is nothing at the moment (as far as I know) that comes close to its functionality. If I pay for my computer running MS Windows or OS X (far from being paradigms of openness), how can I justify being dogmatic about using a commercial R package (particularly if you can access academic or nonprofit pricing)?
Sex and alcohol
As I pointed out above, most of my work has dealt with forestry/agriculture. Nevertheless, this year I have become more interested in the relationship between statistics and public policy issues; for example, minimum wage and unemployment, which I covered as a simple example in this blog. I have to thank my colleague Eric Crampton (in Canterbury’s Department of Economics) for igniting my interest on the use and misuse of statistics, and their interaction with economics, to justify all sort of restrictions and interventions in society. There are very interesting datasets in this area available in, for example, Statistics New Zealand that could be analyzed and would make very interesting case studies for teaching stats. There is a quote by H.G. Wells that comes to mind:
Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.
Rock & roll? This is getting too long, so I will leave music for another time.