# Time series cross-validation using fable

**R on Rob J Hyndman**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Time series cross-validation is handled in the `fable`

package using the `stretch_tsibble()`

function to generate the data folds. In this post I will give two examples of how to use it, one without covariates and one with covariates.

## Quarterly Australian beer production

Here is a simple example using quarterly Australian beer production from 1956 Q1 to 2010 Q2. First we create a data object containing many training sets starting with 3 years (12 observations), and adding one quarter at a time until all data are included.

library(fpp3) beer <- aus_production %>% select(Beer) %>% stretch_tsibble(.init = 12, .step=1) beer ## # A tsibble: 23,805 x 3 [1Q] ## # Key: .id [207] ## Beer Quarter .id ## <dbl> <qtr> <int> ## 1 284 1956 Q1 1 ## 2 213 1956 Q2 1 ## 3 227 1956 Q3 1 ## 4 308 1956 Q4 1 ## 5 262 1957 Q1 1 ## 6 228 1957 Q2 1 ## 7 236 1957 Q3 1 ## 8 320 1957 Q4 1 ## 9 272 1958 Q1 1 ## 10 233 1958 Q2 1 ## # … with 23,795 more rows

This gives 207 training sets of increasing size. We fit an ETS model to each training set and produce one year of forecasts from each model. Because I want to compute RMSE for each forecast horizon, I will add the horizon `h`

to the resulting object.

fc <- beer %>% model(ETS(Beer)) %>% forecast(h = "1 year") %>% group_by(.id) %>% mutate(h = row_number()) %>% ungroup()

Finally, we compare the forecasts against the actual values and average over the folds.

fc %>% accuracy(aus_production, by="h") %>% select(h, RMSE) ## # A tibble: 4 x 2 ## h RMSE ## <int> <dbl> ## 1 1 17.1 ## 2 2 16.7 ## 3 3 18.1 ## 4 4 19.2

Forecasts of 1 and 2 quarters ahead both have about the same accuracy here, but then the error increases for horizons 3 and 4.

## Insurance quotes with TV advertisements

Things are a little more complicated when we want to use covariates in the model. Here is an example of monthly quotations issued by a US insurance company modelled as a function of the TV advertising expenditure in the same month.

The first step is the same, where we stretch the tsibble. This time we will start with one year of data.

stretch <- insurance %>% stretch_tsibble(.step=1, .init=12) stretch ## # A tsibble: 754 x 4 [1M] ## # Key: .id [29] ## Month Quotes TVadverts .id ## <mth> <dbl> <dbl> <int> ## 1 2002 Jan 13.0 7.21 1 ## 2 2002 Feb 15.4 9.44 1 ## 3 2002 Mar 13.2 7.53 1 ## 4 2002 Apr 13.0 7.21 1 ## 5 2002 May 15.4 9.44 1 ## 6 2002 Jun 11.7 6.42 1 ## 7 2002 Jul 10.1 5.81 1 ## 8 2002 Aug 10.8 6.20 1 ## 9 2002 Sep 13.3 7.59 1 ## 10 2002 Oct 14.6 8.00 1 ## # … with 744 more rows

Next we fit a regression model with AR(1) errors to each fold.

fit <- stretch %>% model(ARIMA(Quotes ~ 1 + pdq(1,0,0) + TVadverts))

Before we forecast, we need to provide the advertising expenditure for the future periods. We will forecast up to 3 steps ahead, so the test data needs to have 3 observations per fold.

test <- new_data(stretch, n=3) %>% # Add in covariates from corresponding month left_join(insurance, by="Month") test ## # A tsibble: 87 x 4 [1M] ## # Key: .id [29] ## Month .id Quotes TVadverts ## <mth> <int> <dbl> <dbl> ## 1 2003 Jan 1 17.0 9.53 ## 2 2003 Feb 1 16.9 9.39 ## 3 2003 Mar 1 16.5 8.92 ## 4 2003 Feb 2 16.9 9.39 ## 5 2003 Mar 2 16.5 8.92 ## 6 2003 Apr 2 15.3 8.37 ## 7 2003 Mar 3 16.5 8.92 ## 8 2003 Apr 3 15.3 8.37 ## 9 2003 May 3 15.9 9.84 ## 10 2003 Apr 4 15.3 8.37 ## # … with 77 more rows

The actual value in each month is also included, but that will be ignored when forecasting.

fc <- forecast(fit, new_data = test) %>% group_by(.id) %>% mutate(h = row_number()) %>% ungroup()

Finally, we can compare the forecasts against the actual values, averaged across each forecast horizon.

fc %>% accuracy(insurance, by="h") %>% select(h, RMSE) ## # A tibble: 3 x 2 ## h RMSE ## <int> <dbl> ## 1 1 0.805 ## 2 2 1.24 ## 3 3 1.33

**leave a comment**for the author, please follow the link and comment on their blog:

**R on Rob J Hyndman**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.