ahead forecasting (v0.10.0): fast time series model calibration and Python plots

[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

ahead is an R, Python and Julia package for univariate and multivariate time series forecasting with uncertainty quantification (including predictive simulation). Matlab is next. The aim is always to make these implementations of ahead as similar as possible, but that’s a looot of work as you might guess. Do not hesitate to submit a pull request to these repositories if you want to help and contribute. You can also signal bugs (constructively).

The main changes in ahead’s v0.10.0 are:

  1. Fast calibration of ridge2f using a Leave-One-Out Cross-Validation (LOOCV) trick (R for now)
  2. Univariate forecasting with ridge2f with external regressors
  3. Plotting functionality in Python version

Read on for more details. You can also use this notebook if you want to reproduce the experiments.

1. Fast calibration of ridge2f:

The fast calibration of ridge2f uses a remarkable result available for linear models’ LOOCV:

\[LOOCV error = \frac{1}{n} \sum_{i=1}^n\left(y_i-\hat{f}_{-i}\left(\mathbf{z}_i\right)\right)^2 = \frac{1}{n} \sum_{i=1}^n\left(\frac{y_i-\hat{f}\left(\mathbf{z}_i\right)}{1-\mathbf{S}_{i i}}\right)^2\]

Where \(\hat{f}_{-i}\left(\mathbf{z}_i\right)\) is a model statistical learning model \(f\)’s prediction without the \(i\)-th observation in the training set. \(\mathbf{z}_i\) is the \(i\)-th observation, and \(\mathbf{S}_{i i}\) is the \(i\)-th diagonal element of the hat matrix (a.k.a smoother, a matrix so that \(\hat{y} = \mathbf{S} y\)). Keep in mind that time series cross-vaidation will give different results.

The LOOCV result can be extended further and approximated by Generalized Cross-Validation (GCV), still with a closed-form formula available. GCV is used in ahead::ridge2f. Indeed, even though ridge2 is not – strictly speaking – linear, it possesses the structure of a ridge regression model with 2 regularization parameters; a regularization parameter for the original explanative variables, and a regularization parameter for new, engineered features. These engineered features transform the linear model into a non-linear one.

In the following R example, it’s worth mentioning that only the 2 regularization parameters are calibrated. Other hyperparameters such as the number of time series lags or the number of nodes in the hidden layer are set to their default values. Feel free try and share other experiments including them.

%load_ext rpy2.ipython

%%R

options(repos = c(
    techtonique = 'https://techtonique.r-universe.dev',
    CRAN = 'https://cloud.r-project.org'))

install.packages("ahead")
install.packages("ggplot2")
install.packages("forecast")
install.packages("dfoptim")

%%R

library(ahead)
library(dfoptim)

# Convert to R DataFrame
%R -i df

%R df_ts <- as.ts(df)

%%R

objective_function <- function(xx)
{
    ahead::loocvridge2f(df_ts,
                        h = 20,
                        type_pi="blockbootstrap",
                        lambda_1=10^xx[1],
                        lambda_2=10^xx[2],
                        show_progress = FALSE,
                        )$loocv
}
start <- proc.time()[3]
(opt <- dfoptim::nmkb(fn=objective_function, lower=c(-10,-10), upper=c(10,10), par=c(0.1, 0.1)))
print(proc.time()[3]-start)

elapsed 
  6.657 

%%R

start <- proc.time()[3]
res <- ahead::ridge2f(df_ts, h = 20,
                      type_pi="blockbootstrap",
                      lambda_1=10^opt$par[1], # 'optimal' parameters
                      lambda_2=10^opt$par[2]) # 'optimal' parameters
print(proc.time()[3]-start)

  |======================================================================| 100%
elapsed 
  0.805 

%%R

par(mfrow=c(2, 2))
plot(res, "realgovt", type = "sims", main="50 predictive simulations \n of realgovt")
plot(res, "realgovt", type = "dist", main="predictive distribution \n of realgovt")
plot(res, "realgdp", type = "sims", main="50 predictive simulations \n of realgdp")
plot(res, "realgdp", type = "dist", main="predictive distribution \n of realgovt")

xxx

2. Univariate forecasting with ridge2f and external regressors:

ridge2f was initially designed for multivariate time series forecasting. Nothing prevents it from being used for univariate forecasting. This functionality is now available in ahead (R for now). In addition, as in the multivariate case, it is possible to use external regressors in the forecasting process.

%%R

library(ggplot2)
library(forecast)

x <- fdeaths
xreg <- ahead::createtrendseason(x)
(z <- ahead::ridge2f(x, xreg = xreg, h=20))
autoplot(z)

xxx

3. Plotting functionality in Python version:

Install and import packages

!pip install ahead

import numpy as np
import pandas as pd
import statsmodels.api as sm

from time import time
from ahead import Ridge2Regressor # may take some time to run, ONLY the 1st time it's run
from statsmodels.tsa.base.datetools import dates_from_str
from time import time

Import data

# some example data
mdata = sm.datasets.macrodata.load_pandas().data

# prepare the dates index
dates = mdata[['year', 'quarter']].astype(int).astype(str)

quarterly = dates["year"] + "Q" + dates["quarter"]

quarterly = dates_from_str(quarterly)

mdata = mdata[['realgovt', 'tbilrate', 'cpi', 'realgdp']]

mdata.index = pd.DatetimeIndex(quarterly)

df = np.log(mdata).diff().dropna()

display(df)
realgovt tbilrate cpi realgdp
1959-06-30 0.023664 0.088193 0.005849 0.024942
1959-09-30 0.020481 0.215321 0.006838 -0.001193
1959-12-31 -0.014781 0.125317 0.000681 0.003495
1960-03-31 -0.046197 -0.212805 0.005772 0.022190
1960-06-30 -0.003900 -0.266946 0.000338 -0.004685
... ... ... ... ...
2008-09-30 0.031005 -0.396881 -0.007904 -0.006781
2008-12-31 0.015732 -2.277267 -0.021979 -0.013805
2009-03-31 -0.010967 0.606136 0.002340 -0.016612
2009-06-30 0.026975 -0.200671 0.008419 -0.001851
2009-09-30 0.019888 -0.405465 0.008894 0.006862

202 rows × 4 columns

regr = Ridge2Regressor(h = 20, date_formatting = "original",
                       type_pi="blockbootstrap", # type of simulations for predictive inference
                       B=50, # number of simulations
                       seed=1)

start = time()
regr.forecast(df)
print(f"Elapsed: {time() - start} s")

  |======================================================================| 100%
Elapsed: 0.49063634872436523 s

regr2 = Ridge2Regressor(h = 20, date_formatting = "original",
                       type_pi="gaussian", # Gaussian prediction intervals
                       seed=1)

start = time()
regr2.forecast(df)
print(f"Elapsed: {time() - start} s")

Elapsed: 0.038446664810180664 s

regr.plot("realgovt")

xxx

regr.plot("realgdp")

xxx

regr2.plot("realgovt")

xxx

regr2.plot("realgdp")

xxx

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)