## Regression with Multicollinearity Yields Multiple Sets of Equally Good Coefficients

July 6, 2015
The multiple regression equation represents the linear combination of the predictors with the smallest mean-squared error. That linear combination is a factorization of the predictors with the factors equal to the regression weights. You may see the wo...

## Heteroscedasticity in Regression — It Matters!

June 7, 2015
R’s main linear and nonlinear regression functions, lm() and nls(), report standard errors for parameter estimates under the assumption of homoscedasticity, a fancy word for a situation that rarely occurs in practice. The assumption is that the (conditional) variance of the response variable is the same at any set of values of the predictor variables. … Continue reading...

## Simulation-based power analysis using proportional odds logistic regression

May 22, 2015
Consider planning a clinicial trial where patients are randomized in permuted blocks of size four to either a 'control' or 'treatment' group. The outcome is measured on an 11-point ordinal scale (e.g., the numerical rating scale for pain). It may be reasonable to evaluate the results of this trial using a proportional odds cumulative logit

## Scale back or transform back multiple linear regression coefficients: Arbitrary case with ridge regression

April 10, 2015
SummaryThe common case in data science or machine learning applications, different features or predictors manifest them in different scales. This could bring difficulty in interpreting the resulting coefficients of linear regression, such as one featur...

## A Speed Comparison Between Flexible Linear Regression Alternatives in R

March 25, 2015
Everybody loves speed comparisons! Is R faster than Python? Is dplyr faster than data.table? Is STAN faster than JAGS? It has been said that speed comparisons are utterly meaningless, and in general I agree, especially when you are comparing apples and oranges which is what I’m going to do here. I’m going to compare a couple of alternatives to...

## Regression Models, It’s Not Only About Interpretation

March 22, 2015
$k$

Yesterday, I did upload a post where I tried to show that “standard” regression models where not performing bad. At least if you include splines (multivariate splines) to take into accound joint effects, and nonlinearities. So far, I do not discuss the possible high number of features (but with boostrap procedures, it is possible to assess something related to...

## Machine Learning: Definition of %Var(y) in R’s randomForest package’s regression method

March 13, 2015
The second column is simply the first column divided by the variance of the response that have been OOB up to that point (20 trees), times 100. Source: https://stat.ethz.ch/pipermail/r-help/2008-July/167748.html

## SAS PROC MCMC example in R: Nonlinear Poisson Regression Multilevel Random-Effects Model

March 8, 2015
I am slowly working my way through the PROC MCMCexamples. Regarding these data, the SAS manual says: 'This example uses the pump failure data of Gaver and O’Muircheartaigh (1987) to illustrate how to fit a multilevel random-effects model with PROC MCMC. The number of failures and the time of operation ...

## More 3D Graphics (rgl) for Classification with Local Logistic Regression and Kernel Density Estimates (from The Elements of Statistical Learning)

February 7, 2015
This post builds on a previous post, but can be read and understood independently. As part of my course on statistical learning, we created 3D graphics to foster a more intuitive understanding of the various methods that are used to relax the assumption of linearity (in the predictors) in regression and classification methods. The authors

## Inequalities and Quantile Regression

February 6, 2015
In the course on inequality measure, we've seen how to compute various (standard) inequality indices, based on some sample of incomes (that can be binned, in various categories). On Thursday, we discussed the fact that incomes can be related to different variables (e.g. experience), and that comparing income inequalities between coutries can be biased, if they have very different...