the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-se...

the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-se...

Adjusting standard errors for clustering can be a very important part of any statistical analysis. For example, duplicating a data set will reduce the standard errors dramatically despite there being no new information. I have previously dealt with this topic with reference to the linear regression model. However, in many cases one would like to

R’s formula interface is sweet but sometimes confusing. ANOVA is seldom sweet and almost always confusing. And random (a.k.a. mixed) versus fixed effects decisions seem to hurt peoples’ heads too. So, let’s dive into the intersection of these three. I’m aware that there are lots of packages for running ANOVA models that make things nicer

As mentioned in the Appendix of Modern Actuarial Risk Theory, “R (and S) is the ‘lingua franca’ of data analysis and statistical computing, used in academia, climate research, computer science, bioinformatics, pharmaceutical industry, customer analytics, data mining, finance and by some insurers. Apart from being stable, fast, always up-to-date and very versatile, the chief advantage of R is that...

The go-to bible for this data scientist and many others is The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Each of the authors is an expert in machine learning / prediction, and in some cases invented the techniques we turn to today to make sense of big data: ensemble...

A recent post on the PirateGrunt blog on claims reserving inspired me to look into the paper Regression models based on log-incremental payments by Stavros Christofides , published as part of the Claims Reserving Manual (Version 2) of the Institute of Actuaries.The paper is available together with a spread sheet model, illustrating the calculations. It...

Students in any basic statistics class are taught linear regression, which is one of the simplest forms of a statistical model. The basic idea is that a ‘response’ variable can be mathematically related to one or any number of ‘explanatory’ variables through a linear equation and a normally distributed error term. With any statistical tool,

This blog post by Sean Taylor generated quite a stir. He discussed the signals one sends by using certain software packages and seems to think that R users are more competent. The reactions ranged from amusement to bashing. In defense of hard to learn statistical tools, i.e. #rstats prsm.tc/gyTBRK <- pretty funny 'who uses what I encourage you...