I promise this is my last post on the now week and a half old π pay! Building on the last post, I figured I could show how convergence actually works in the estimation algorithm. If you’ll recall, we plotted … Continue reading →

The recently announced Revolution Analytics / Zementis partnership goes a long way towards demonstrating how R fits into big-league production environments. A frequent complaint against R is that although R is fine prototyping tool it is not able to handle production environments. Well, that’s just not true. In fact, it is straightforward to build a model in R, translate...

While I had mentioned in my last post that I will cover logistic regression in my next post, I decided that a quick interlude in working with bubble plots would be fun. Bubble plots have become pretty popular recently, especially with all of the Visualization Challenges I've seen around the internet (by the way, I...

Here are the results of a "Curse of Dimensionality" homework assignment for Terran Lane's Introduction to Machine Learning class. Pretty pictures, interesting results, and a good exercise in explicit parallelism with R. It's neat to see distance scaling linearly with standard deviation, and linearly with the Lth-root...

Heritage Health and Kaggle have teamed up to create the biggest data science competition thus far: the Heritage Health Prize, which challenges competitors to build a statistical model to predict the number of days a person is likely to spend in hospital over the next year, based on (anonymized) factors such as demographics, medical visits and treatments, and other...

In my last two posts I talked about Ordinary Least Squares, then extended this discussion to the multiple predictor case and briefly talked about some of the problems that may arise. These problems can include omitted variable bias, heteroskedasticity, non-normality, and multicollinearity. Most of these problems are relatively minor in practice and have easy fixes,...