Which one of the sqldf, plyr, doBy and aggregate functions/packages would be faster for applying functions on groups of rows? I was wondering about this earlier in this post. It seems sqldf would be the fastest according to a post in manipulatr m...

Heritage Health and Kaggle have teamed up to create the biggest data science competition thus far: the Heritage Health Prize, which challenges competitors to build a statistical model to predict the number of days a person is likely to spend in hospital over the next year, based on (anonymized) factors such as demographics, medical visits and treatments, and other...

In my last two posts I talked about Ordinary Least Squares, then extended this discussion to the multiple predictor case and briefly talked about some of the problems that may arise. These problems can include omitted variable bias, heteroskedasticity, non-normality, and multicollinearity. Most of these problems are relatively minor in practice and have easy fixes,...

A quick reminder that Revolution Analytics' CTO David Champagne will be hosting a live webinar tomorrow (March 16) on Integrating R into 3rd Party and Web Applications Using RevoDeployR. Designed for application developers, this webinar will cover publishing R scripts to the RevoDeployR server, and integrating their results into Web applications, Microsoft Excel, JasperReports Server and more. Complete details...

As you may know, today is Pi Day, when all good nerds take a moment to thank the geeks of antiquity for their painstaking work in estimating this marvelous mathematical constant. It is also a great opportunity to thank contemporary geeks for the wonders of modern computing, which allow us to estimate pi to near