John D. Cook gave a great talk about ‘Why and how people use R‘. The talk resonated with me and highlighted why R is such a great tool for end user computing. A topic which has become increasingly important in the European insurance industry.
John’s main point on why people use R is that R gets the job done and I think he is spot on. Of course that’s the trouble with R sometimes as well, or to quote Bo again:
“The best thing about R is that it was developed by statisticians.
“The worst thing about R is that it was developed by statisticians.”
Bo Cowgill, Google
Indeed R is frequently used by individuals who commission their own work, rather than by professional programmers who develop tools for others. Or in other words R is mainly used for end user computing. And more often than not R users don’t use the software for big monster projects but to find answers to their own questions, and then one answer leads to another questions which eventually leads to insight.
John also points out that R is often not learned like other programming languages by reading a book about the language definition, control structures, etc. but by learning statistics and using R as a teaching tool. Indeed, wasn’t that one of the original motivations of Ross and Robert, the creators of R, apart from getting their own research done? Probably very few R users have ever read the R Language Definition from cover to cover.
To some extend the same arguments are true for spread sheet software as well, in particular that it gets the job done and that most people learn it by using it. Yet there is one fundamental difference between spread sheets and R, or other languages. Programming languages like R are based on plain text files and that is quite a big deal, if you want to manage end user computing.
I realised the power of text files when I started my first R package. Suddenly I had to think about version control, documentation and testing and as a result was forced to think a little bit like a professional programmer. Still what I was doing was end user computing, I was doing all of this to get my own work done. However I realised that I had to manage my code better. In the past I thought documentation is for people without talent. That’s actually quite arrogant. I have heard others say that they don’t like documenting their work because it takes away their magic. But here is the deal, over time even I will turn into a another person and then the documentation becomes incredible helpful, not even mentioning version control.
Funnily enough the experience I gained in building packages was quite helpful for my job as well. The insurance industry is going through a huge transition in Europe. A new regulatory regime called Solvency II is being rolled out. This changes the way how insurance companies have to asses their capital requirements. A lot of work is required around data management and end user computing to ensure that certain standards and audit criteria are met. Most of this actually good practice anyhow, see Solvency II Data Audit Appendix 2, Table 5.1.
Open source communities had to overcome those challenges in the past already: How do you organise work across multiple teams? How do you define interfaces? How do you deal with security, incident management, documentation, testing, roll out, etc.?
The R documentation on writing R extensions answers those questions and offers a blue print and framework for end user computing. Over 3800 packages on CRAN demonstrate the success of this approach. And I believe that most of the packages are the result of end user computing. So maybe this is actually the biggest deal about R that it is build successfully on end user computing.