Spring is at hand and it is a time of renewal, March Madness and to settle scores in the NHL. There are many scores to be settled: Flyers vs. Penguins, Blackhawks vs. Red Wings, Leafs vs. Habs and pretty much everyone else vs. the Bruins. L...

(This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers) Now that I'm ridiculously behind in the Stanford Online Statistical Learning class, I thought it would be fun to try to reproduce the figure on page 36 of the slides from chapter 3 or page 81 of the book. The result is a curvaceous surface...

Continue to discuss this topic about multicollinearity in regression. Firstly, it is necessary introduce how to calculate the VIF and condition number via software such as R. Of course it is really easy for us. The vif() in car and kappa() can be applied to calculate the VIF and condition number, respectively. Consider the data from … Continue reading...

Dealing with proportion data on the interval $$ is tricky. I realized this while trying to explain variation in vegetation cover. Unfortunately this is a true proportion, and can’t be made into a binary response. Further, true 0’s and 1’s rule out beta regression. You could arcsine square root transform the data (but shouldn’t; Warton and Hui 2011)....

Including a series of dummy variables in a regression in R is very simple. For example,ols <- lm(weight ~ Time + Diet, data = ChickWeight)summary(ols) The above regression automatically includes a dummy variable for all but the first level of the factor of the Diet variable.Call:lm(formula = weight ~ Time...

In multiple regression analysis, multicollinearity is a common phenomenon, in which two or more predictor variables are highly correlated. If there is an exact linear relationship (perfect multicollinearity) among the independent variables, the rank of X is less than k+1(assume the number of predictor variables is k), and the matrix will not be invertible. So the strong correlations … Continue reading...

If you're new to the R language but keen to get started with linear modeling or logistic regression in the language, take a look at this "Introduction to R" PDF, by Princeton's Germán Rodríguez. (There's also a browsable HTML version.) In a crisp 35 pages it begins by taking you through the basics of R: simple objects, importing data,...

Day Eight: LASSO Regression TL/DR LASSO regression (least absolute shrinkage and selection operator) is a modified form of least squares regression that penalizes model complexity via a regularization parameter. It does so by including a term proportional to $||\beta||_{l_1}$ in the objective function which shrinks coefficients towards zero, and can even eliminate them entirely. In that light, LASSO is a...