Blog Archives

Don’t use correlation to track prediction performance

February 22, 2013
By
Don’t use correlation to track prediction performance

Using correlation to track model performance is “a mistake that nobody would ever make” combined with a vague “what would be wrong if I did do that” feeling. I hope after reading this feel a least a small urge to double check your work and presentations to make sure you have not reported correlation where Related posts:

Read more »

Please stop using Excel-like formats to exchange data

December 7, 2012
By
Please stop using Excel-like formats to exchange data

I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to make a plea to my Related posts:

Read more »

Level fit summaries can be tricky in R

October 1, 2012
By
Level fit summaries can be tricky in R

Model level fit summaries can be tricky in R. A quick read of model fit summary data for factor levels can be misleading. We describe the issue and demonstrate techniques for dealing with them.When modeling you often encounter what are commonly called categorical variables, which are called factors in R. Possible values of categorical variables Related posts:

Read more »

Newton-Raphson can compute an average

August 28, 2012
By
Newton-Raphson can compute an average

In our article How robust is logistic regression? we pointed out some basic yet deep limitations of the traditional full-step Newton-Raphson or Iteratively Reweighted Least Squares methods of solving logistic regression problems (such as in R‘s standard glm() implementation). In fact in the comments we exhibit a well posed data fitting problem that can not Related posts:

Read more »

How robust is logistic regression?

August 23, 2012
By
How robust is logistic regression?

Logistic Regression is a popular and effective technique for modeling categorical outcomes as a function of both continuous and categorical variables. The question is: how robust is it? Or: how robust are the common implementations? (note: we are using robust in a more standard English sense of performs well for all inputs, not in the Related posts:

Read more »

What does a generalized linear model do?

August 15, 2012
By
What does a generalized linear model do?

What does a generalized linear model do? R supplies a modeling function called glm() that fits generalized linear models (abbreviated as GLMs). A natural question is what does it do and what problem is it solving for you? We work some examples and place generalized linear models in context with other techniques.For predicting a categorical Related posts:

Read more »

Modeling Trick: Masked Variables

July 1, 2012
By
Modeling Trick: Masked Variables

A primary problem data scientists face again and again is: how to properly adapt or treat variables so they are best possible components of a regression. Some analysts at this point delegate control to a shape choosing system like neural nets. I feel such a choice gives up far too much statistical rigor, transparency and Related posts:

Read more »

How to outrun a crashing alien spaceship

June 11, 2012
By
How to outrun a crashing alien spaceship

Hollywood movies are obsessed with outrunning explosions and outrunning crashing alien spaceships. For explosions the movies give the optimal (but unusable) solution: run straight away. For crashing alien spaceships they give the same advice, but in this case it is wrong. We demonstrate the correct angle to flee. Running from a crashing alien spaceship, Prometheus Related posts:

Read more »

Selection in R

June 1, 2012
By

The design of the statistical programming language R sits in a slightly uncomfortable place between the functional programming and object oriented paradigms. The upside is you get a lot of the expressive power of both programming paradigms. A downside of this is: the not always useful variability of the language’s list and object extraction operators. Related posts:

Read more »

How to remember point shape codes in R

April 24, 2012
By
How to remember point shape codes in R

I suspect I am not unique in not being able to remember how to control the point shapes in R. Part of this is a documentation problem: no package ever seems to write the shapes down. All packages just use the “usual set” that derives from S-Plus and was carried through base-graphics, to grid, lattice Related posts:

Read more »