R’s formula interface is sweet but sometimes confusing. ANOVA is seldom sweet and almost always confusing. And random (a.k.a. mixed) versus fixed effects decisions seem to hurt peoples’ heads too. So, let’s dive into the intersection of these three. I’m aware that there are lots of packages for running ANOVA models that make things nicer

As mentioned in the Appendix of Modern Actuarial Risk Theory, “R (and S) is the ‘lingua franca’ of data analysis and statistical computing, used in academia, climate research, computer science, bioinformatics, pharmaceutical industry, customer analytics, data mining, finance and by some insurers. Apart from being stable, fast, always up-to-date and very versatile, the chief advantage of R is that...

The go-to bible for this data scientist and many others is The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Each of the authors is an expert in machine learning / prediction, and in some cases invented the techniques we turn to today to make sense of big data: ensemble...

A recent post on the PirateGrunt blog on claims reserving inspired me to look into the paper Regression models based on log-incremental payments by Stavros Christofides , published as part of the Claims Reserving Manual (Version 2) of the Institute of Actuaries.The paper is available together with a spread sheet model, illustrating the calculations. It...

Students in any basic statistics class are taught linear regression, which is one of the simplest forms of a statistical model. The basic idea is that a ‘response’ variable can be mathematically related to one or any number of ‘explanatory’ variables through a linear equation and a normally distributed error term. With any statistical tool,

This blog post by Sean Taylor generated quite a stir. He discussed the signals one sends by using certain software packages and seems to think that R users are more competent. The reactions ranged from amusement to bashing. In defense of hard to learn statistical tools, i.e. #rstats prsm.tc/gyTBRK <- pretty funny 'who uses what I encourage you...

Weighted averaging regression and calibration is the most widely used method for developing a palaeolimnological transfer function. Such models are used to reconstruct properties of the past lake environment such as pH, total phosphorus, and water temperature with, it has to be said, varying degrees of success and usefulness. In simple weighted averaging (WA) there is little to specify other...