Okay so we want to forecast GDP. How do we even begin such a burdensome ordeal?Well each time series has 4 components that we wish to deal with and those are seasonality, trend, cyclicality and error. If we deal with seasonally adjusted data we d...

The title says “things” but conferences are mainly about people. Some of it can be serendipitous. For example, one day I sat next to Jonathan Rougier at lunch because I had a question for him about climate models. When Jonathan left, I started a conversation with the person on my other side. That was most … Continue reading...

In a tongue-in-cheek post at the Information Management blog, Steve Miller shares his "frustration" with R: package developers keep on releasing new functionality for R that makes his own work obsolete. For example, there's now pre-packaged functionality in R for enhanced dotplots, Economist-style graphics, additive regression models and more, which all obviate the need for Steve to implement such...

Time series data are widely seen in analytics. Some examples are stock indexes/prices, currency exchange rates and electrocardiogram (ECG). Traditional time series analysis focuses on smoothing, decomposition and forecasting, and there are many R functions and packages available for those … Continue reading →

I have been waiting for the KDD conference to come to California, and I was ecstatic to see it held in San Diego this year. AdMeld did an awesome job displaying KDD ads on the sites that I visit, sometimes multiple times per page. That’s good targeting! Mining and Learning on Graphs Workshop 2011 I had originally planned to attend the...

For a quick recap, Pierre and I supervised a team project at Ensae last year, on a statistical critique of the abstract painting 1024 Colours by painter Gerhard Richter. The four students, Clémence Bonniot, Anne Degrave, Guillaume Roussellet and Astrid Tricaud, did an outstanding job. Here is a selection of graphs and results they produced.

Over at the ExploringDataBlog, Ron Pearson just wrote a post about the cases when means are useless. In fact, it’s possible to calculate a whole load of stats on your data and still not really understand it. The canonical dataset for demonstrating this (spoiler alert: if you are doing an intro to stats course, you

As I stand here at Heathrow waiting for my flight back to the States, I thought I'd dash off a few quick reflections of the userR! 2011 conference at University Warwick. It was an outstanding event. There's something about a conference of just a few hundred attendees (there were about 450) that creates a sense of camaraderie and common...

The RevoScaleR package isn’t open source, but it is free for academic users. Collect and storing data has outpaced our ability to analyze it. Can R cope with this challenge? The RevoScaleR package is part of the revolution R Enterprise. This package provides data management and data analysis. Uses multiple cores and should scale. Scalability