Blog Archives

Hierarchical forecasting with hts v4.0

February 12, 2014
By

A new version of my hts package for R is now on CRAN. It was completely re-written from scratch. Not a single line of code survived. There are some minor syntax changes, but the biggest change is speed and scope. This version is many times faster than the previous version and can handle hundreds of thousands of time series...

Read more »

Detecting seasonality

February 7, 2014
By
Detecting seasonality

I occasionally get email asking how to detect whether seasonality is present in a data set. Sometimes the period of the potential seasonality is known, but in other cases it is not. I’ve discussed before how to estimate an unknown seasonal period, and how to measure the strength of the seasonality. In this post, I want to look at...

Read more »

Feedback on OTexts covers please

February 5, 2014
By

We are currently selecting the cover design for OTexts books. The first one to go into print will be Forecasting: principles and practice. We have narrowed the choice to the two designs below, although changes are still possible. I thought it would be useful to get some feedback on these designs from readers of this blog (and from people...

Read more »

Interview for the Capital of Statistics

February 4, 2014
By

Earo Wang recently interviewed me for the Chinese website Capital of Statistics. The English transcript of the intervew is on Earo’s personal website. This is the third interview I’ve done in the last 18 months. The others were for: Data Mining Research. Republished in Amstat News. DecisionStats.  

Read more »

Computational Actuarial Science with R

February 2, 2014
By
Computational Actuarial Science with R

I recently co-authored a chapter on “Prospective Life Tables” for this book, edited by Arthur Charpentier. R code to reproduce the figures and to complete the exercises for our chapter is now available on github. Code for the other chapters should also be available soon. The book can be pre-ordered on Amazon.

Read more »

Automatic time series forecasting in Granada

January 30, 2014
By

In two weeks I am presenting a workshop at the University of Granada (Spain) on Automatic Time Series Forecasting. Unlike most of my talks, this is not intended to be primarily about my own research. Rather it is to provide a state-of-the-art overview of the topic (at a level suitable for Masters students in Computer Science). I thought I’d provide...

Read more »

Free books on statistical learning

January 29, 2014
By

Hastie, Tibshirani and Friedman’s Elements of Statistical Learning first appeared in 2001 and is already a classic. It is my go-to book when I need a quick refresher on a machine learning algorithm. I like it because it is written using the language and perspective of statistics, and provides a very useful entry point into the literature of machine...

Read more »

Time series data in R

January 28, 2014
By

There is no shortage of time series data available on the web for use in student projects, or self-learning, or to test out new forecasting algorithms. It is now relatively easy to access these data sets directly in R. M Competition data The 1001 series from the M-competition and the 3003 series from the M3-competition are available as part...

Read more »

New in forecast 5.0

January 26, 2014
By
New in forecast 5.0

Last week, version 5.0 of the forecast package for R was released. There are a few new functions and changes made to the package, which is why I increased the version number to 5.0. Thanks to Earo Wang for helping with this new version. Handling missing values and outliers Data cleaning is often the first step that data scientists...

Read more »

Thoughts on the Ljung-Box test

January 23, 2014
By
Thoughts on the Ljung-Box test

It is common to use a Ljung-Box test to check that the residuals from a time series model resemble white noise. However, there is very little practical advice around about how to choose the number of lags for the test. The Ljung-Box test was proposed by Ljung and Box (Biometrika, 1978) and is based on the statistic    ...

Read more »