Taming Volatility: High-Performance Forecasting of the STOXX 600 with H2O AutoML

Selcuk Disci

8 hours ago

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Forecasting financial markets, such as the STOXX Europe 600 Index, presents a classic Machine Learning challenge: the data is inherently noisy, non-stationary, and highly susceptible to sudden market events. To tackle this, we turn to Automated Machine Learning (AutoML)—specifically the powerful, scalable framework provided by H2O.ai and integrated into the R modeltime ecosystem.

This article dissects a full MLOps workflow, from data acquisition and feature engineering to model training and evaluation, revealing how a high-performance, low-variance model triumphed over the market’s volatility.

1. The Forecasting Pipeline: Building a Feature-Rich Model

The core strategy involved converting the univariate time series problem into a supervised regression problem by generating powerful explanatory variables.

A. Data & Splitting

Target: STOXX Europe 600 Index closing price.
Time Frame: 12 months of daily data (ending 2025-10-31).
Validation: A rigorous cumulative time series split was used, with the last 15 days reserved for testing (assess = "15 days"). This mimics a real-world backtesting scenario.

#Install Development Version of modeltime.h2o
devtools::install_github("business-science/modeltime.h2o", force = TRUE)

library(tidymodels)
library(modeltime.h2o)
library(tidyverse)
library(timetk)

#STOXX Europe 600
df_stoxx <- 
  tq_get("^STOXX", to = "2025-10-31") %>% 
  select(date, stoxx = close) %>% 
  mutate(id = "id") %>% 
  filter(date >= last(date) - months(12)) %>% 
  drop_na()


#Train/Test Splitting
splits <-  
  df_stoxx %>% 
  time_series_split(
    assess     = "15 days", 
    cumulative = TRUE
  )

B. Feature Engineering (The Recipe)

A robust feature recipe (rec_spec) was designed to capture both time dependence and seasonality:

Autoregressive (AR) Lags: step_lag(stoxx, lag = 1:2) explicitly included the price of the previous one and two days. This is the most crucial feature for capturing market momentum and inertia. We concluded that from the diagnostic analysis.
Seasonality: step_fourier(date, period = 365.25, K = 1) was used to capture subtle annual and quarterly cyclical effects.
Calendar Effects: step_timeseries_signature(date) generated features like dayofweek, which can be essential for capturing known market anomalies (e.g., the “Monday effect”).

#Preprocessed data/Feature engineering
rec_spec <- 
  recipe(stoxx ~ date, data = training(splits)) %>% 
  step_timeseries_signature(date) %>% 
  step_lag(stoxx, lag = 1:2) %>% 
  step_fourier(date, period = 365.25, K = 1) %>%
  step_dummy(all_nominal_predictors(), one_hot = TRUE) %>% 
  step_zv(all_predictors()) %>% 
  step_naomit(all_predictors())

#Train 
train_tbl <- 
  rec_spec %>% 
  prep() %>% 
  bake(training(splits))

#Test
test_tbl  <- 
  rec_spec %>% 
  prep() %>% 
  bake(testing(splits))

2. AutoML Execution: The Race Against the Clock

We initiated the H2O AutoML process using automl_reg() under strict resource constraints to quickly identify the most promising model type:

Parameter	Value	Rationale
`max_runtime_secs`	5	Time limit for the entire process.
`max_models`	3	Limit on the number of base models to train.
`exclude_algos`	`"DeepLearning"`	Excluding computationally expensive models for rapid prototyping.

#Initialize H2O
h2o.init(
  nthreads = -1,
  ip       = 'localhost',
  port     = 54321
)



#Model specification and fitting
model_spec <- automl_reg(mode = 'regression') %>%
  set_engine(
    engine                     = 'h2o',
    max_runtime_secs           = 5, 
    max_runtime_secs_per_model = 3,
    max_models                 = 3,
    nfolds                     = 5,
    exclude_algos              = c("DeepLearning"),
    verbosity                  = NULL,
    seed                       = 98765
  ) 


model_fitted <- 
  model_spec %>%
  fit(stoxx ~ ., data = train_tbl)

These tight constraints resulted in a leaderboard featuring only the fastest and highest-performing base algorithms:

Rank	Model ID	Algorithm	Cross-Validation RMSE
1	DRF_1_AutoML…	Distributed Random Forest	3.99
2	GBM_2_AutoML…	Gradient Boosting Machine	4.20
3	GLM_1_AutoML…	Generalized Linear Model	5.50

#Evaluation
model_fitted %>% 
  automl_leaderboard()

3. The Winner: Distributed Random Forest (DRF)

The Distributed Random Forest (DRF) emerged as the leader in the cross-validation phase, demonstrating superior generalization ability with the lowest Root Mean Squared Error (RMSE) of 3.99.

Why DRF Won: The Low Variance Advantage

The DRF model’s victory over the generally higher-accuracy Gradient Boosting Machine (GBM) is a powerful illustration of the Bias-Variance Trade-off in noisy data:

Financial Volatility Implies High Variance: The daily STOXX index is inherently gurgly and prone to random noise, a characteristic of high model variance.
DRF’s Low-Variance Mechanism: DRF relies on Bagging (Bootstrap Aggregating). It trains hundreds of decision trees on random subsets of the data and features. Crucially, it then averages their individual predictions.
- This averaging process effectively cancels out the random errors (noise) learned by individual trees.
- By prioritizing low variance, DRF achieved a highly stable and reliable fit, which was essential for taming the market’s noise. The small increase in Bias (which comes from averaging and smoothing) was a small price to pay for the massive reduction in error-inducing variance.

Test Set Performance

Calibrating the leading DRF model on the final 15-day test set confirmed its strong performance:

Metric	DRF Test Set Value	Interpretation
RMSE	10.9	A jump from the training RMSE (3.99), typical of non-stationary financial data, but remains a strong result for market prediction.
R-Squared	0.537	The model explains over 53% of the variance in the unseen test data.

#Modeltime Table
model_tbl <- 
  modeltime_table(
    model_fitted
  )


#Calibration to test data
calib_tbl <- 
  model_tbl %>%
  modeltime_calibrate(
    new_data = test_tbl
  )

#Measure Test Accuracy
calib_tbl %>% 
  modeltime_accuracy()

Finally, we can construct predictive intervals, which are used as a kind of Relative Strength Index (RSI) in this context.

#Prediction Intervals
calib_tbl %>%
  modeltime_forecast(
    new_data    = test_tbl,
    actual_data = test_tbl
  ) %>%
  plot_modeltime_forecast(
    .interactive = FALSE,
    .line_size = 1.5
  )  +
  labs(title = "Modeling with Automated ML for the STOXX Europe 600", 
       subtitle = "<span style = 'color:dimgrey;'>Predictive Intervals</span> of <span style = 'color:red;'>Distributed Random Forest</span> Model", 
       y = "", 
       x = "") + 
  scale_y_continuous(labels = scales::label_currency(prefix = "€")) +
  scale_x_date(labels = scales::label_date("%b %d"),
               date_breaks = "2 days") +
  theme_minimal(base_family = "Roboto Slab", base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 16),
        plot.subtitle = ggtext::element_markdown(face = "bold"),
        plot.background = element_rect(fill = "azure", color = "azure"),
        panel.background = element_rect(fill = "snow", color = "snow"),
        axis.text = element_text(face = "bold"),
        axis.text.x = element_text(angle = 45, 
                                   hjust = 1, 
                                   vjust = 1),
        legend.position = "none")

NOTE: This article was generated with the support of an AI assistant. The final content and structure were reviewed and approved by the author.

To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.