Blog Archives

The NYC Marathon

November 8, 2010
By
The NYC Marathon

New York’s annual marathon took place yesterday. Watching a bit of it on television with my friends, I was struck by the much earlier starting time for women than men. Specifically, professional women started running yesterday at 9:10 AM, while professional men start running at 9:40 AM. (This information comes from the runner’s handbook.) I

Read more »

The Answer Depends on the Question

November 3, 2010
By

To quote from the preface to the first edition in Jeffreys (1961): ‘It is sometimes considered a paradox that the answer depends not only on the observations but on the question; it should be a platitude.’1 Generalized Linear Models : P. ...

Read more »

Promising R Packages

October 21, 2010
By

As a quick note, here are two R packages that were mentioned to me recently and that look promising: reldist and mixtools.

Read more »

EM and Regression Mixture Modeling

October 19, 2010
By
EM and Regression Mixture Modeling

Last night, Drew Conway showed me a fascinating graph that he made from the R package data we’ve recently collected from CRAN. That graph will be posted and described in the near future, because it has some really interesting implications for the structure of the R package world. But for the moment I want to

Read more »

R Recommendation Contest Launches on Kaggle

October 10, 2010
By

The R Recommendation Engine contest is now live on Kaggle. Please head over there and start submitting your predictions for the test data set. Once you do, you can check the leaderboard to see how your algorithm compares with other people’s work. We know that there’s still plenty of progress that can be made, because

Read more »

Build a Recommendation System for R Packages

October 7, 2010
By

On Dataists, a new collaborative blog for data hackers that I’m contributing to, we’ve just announced a data contest that’s custom made for R users. To win the contest, you need to build a recommendation system for R packages. To find out more, check out the official announcement on Dataists. Then go to GitHub to

Read more »

ProjectTemplate Version 0.1-3 Released

October 2, 2010
By

I’ve just released the newest version of ProjectTemplate. The primary change is a completely redesigned mechanism for automatically loading data. ProjectTemplate can now read compressed CSV files, access CSV data files over HTTP, read Stata, SPSS and RData binary files and even load MySQL database tables automatically. For my own projects, this is a big

Read more »

Three-Quarter Truths: Correlation Is Not Causation

October 1, 2010
By
Three-Quarter Truths: Correlation Is Not Causation

Other than our culture’s implicit association between lies, damned lies and statistics, I think no idea has stifled the growth of statistical literacy as much as the endless repetition of the words correlation is not causation. This phrase seems to be primarily used to suppress intellectual inquiry by encouraging the unspoken assumption that correlational knowledge

Read more »

Two New R Packages: log4r and SortableHTMLTables

September 25, 2010
By

I’ve just released two new packages for R: log4r and SortableHTMLTables. log4r is a minimal logging utility for R that’s inspired by the log4j family of logging tools. It has substantially fewer features than other logging tools for R, but it’s hopefully easier to use. SortableHTMLTables uses brew and the jQuery Tablesorter plugin to provide

Read more »

Higher Order Functions in R

September 23, 2010
By

Introduction Because R is, in part, a functional programming language, the ‘base’ package contains several higher order functions. By higher order functions, I mean functions that take another function as an argument and then do something with that function. If you want to know more about the usefulness of writing higher order functions in general,

Read more »