# Multistep horizon loss optimized forecasting with ADAM

**Peter Laurinec**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently, during work on some forecasting task at PowereX, I stumbled upon an interesting improvement in time series forecasting modeling. In general, in regression modeling, we use evaluation/loss metrics based on one step ahead (one value) horizon. But, in many forecasting applications, we are interested in multistep horizon forecasting (e.g. whole 1 day ahead), so some multistep losses would be great to optimize during training of forecasting models. Except obviously in some RNN architectures, this wasn’t easily possible in used libraries/frameworks in the past time.

So, I was very happy to see multistep losses in `smooth`

package and its `adam()`

wrapper and to try it. I already used it in one particular task in my work, and it went successfully to production!

My motivation and what I will show you in this blog post is:

- Tribute to the
`smooth`

package and its`adam()`

wrapper – especially for its multistep horizon forecasting losses. - Try to use the mentioned losses for forecasting household consumption times series -> very noisy with daily/weekly seasons, no trend.
- Show model diagnostic methods with
`ggplot2`

visualizations.

### Household consumption data

Firstly, let’s load all the needed packages for forecasting.

Next, I will read one household electricity consumption load data. There are data from `2023-01-01`

to `2023-08-31`

. This household heats space and water with electricity – so there are multiple seasonalities that vary during the year.

Let’s visualize consumption for one month.

We can see that the behavior of the time series is very noisy but with clear daily pattern.

Let’s check more in detail, what we will face during forecasting, so I will show you the weekly average pattern (+/- standard deviation) by month.

We can see that consumption patterns change over months…

I will show you also an average daily pattern by month.

It’s obvious that during winter months there is different behavior of electricity consumption, and much more consumed electricity because of heating needs.

### smooth and ADAM

The main goal of this blog post is to try experimentally multistep losses in ETS model and compare them against classical losses (MSE and MAE).

In `adam`

function we have several possibilities for multistep losses, check in here: https://openforecast.org/adam/multistepLosses.html.

I will use these in experiments:

- MSEh, MAEh
- TMSE – Trace MSE
- GTMSE – Geometric Trace MSE
- MSCE – Mean Squared Cumulative Error
- GPL – General Predictive Likelihood

Let’s try `adam`

on 4 weeks sample of data.

Firstly, I will try only the additive ETS model with classic MSE loss.

We can see that **ANA** model is selected, so additive on level and season components, trend component was not used.

Let’s try multistep loss - GPL.

Also, ANA was selected. Better AICc and MAE with multistep loss, but RMSE was worse than in the MSE case. Let’s see on whole dataset…

### Prediction simulations

I will roll predictions day by day with a whole 1-day prediction horizon with a train-set of length 28 days (`win_days`

) on our household consumption data -> so 214 predictions will be made.
I will use all multistep losses mentioned above, MAE, MSE, and STL+ARIMA, STL+ETS methods from the `forecast`

package as benchmarks (as was used also in my previous blog post about bootstrapping).
You can check R script how to simulate and compute this in parallel on my GitHub repository.

Now, let’s see generated predictions.

I will compute forecasting evaluation metrics as RMSE, MAE, and MAAPE.

We can see that all multistep losses are better than benchmark methods! That’s very nice results!

For further analysis, we will check only the two best multistep losses and two benchmarks.

Let’s see all generated prediction graphs with selected models:

There is a much lower variance of predictions with multistep losses ETS than in benchmarks, which’s interesting, but in nature of these losses logical also -> again check https://openforecast.org/adam/multistepLosses.html#multistepLosses.

### Models diagnostics

Next, I will try to demonstrate how to compare results from multiple models and analyze them with model diagnostic methods, which are also described in adam forecasting book.

Let’s create residuals and abs(residuals) columns for investigation purposes.

Let’s see the boxplot of absolute errors:

We see narrower IQR with multistep losses, that’s a good feature.

Predictions vs real values scatter plot:

That is not very nice behavior in general, actually :), there are a lot of below-zero predictions (but fewer with multistep losses!) and no high-load predictions with multistep losses.

Residuals vs real values scatter plot:

Again, it is not a very nice plot 🙂 the conclusions are very similar to with previous one.

Heteroscedasticity check - so real values vs. absolute errors.

Our models have no heteroscedasticity at all, so absolute errors are not similarly distributed around real values distribution.

Let’s see assumptions of normally distributed errors - Q-Q plot check:

We can see a nicer Q-Q plot with multistep losses - values are sliding on the optimal line much more than with benchmarks.

The last check is about the assumption that multistep forecasting errors have zero mean (will check it hourly - not by 15 minutes):

All models are quite around zero, so good.

### Summary

In this blog post, I showed that multistep losses in the ETS model have a great impact on forecast behavior:

- lower variance,
- and more accurate forecasts,

in our case of 1 day ahead household electricity consumption forecasting.

In general, household electricity consumption forecasting with ETS is not satisfactory as seen with model diagnostic methods that were showed -> higher values of consumption load are not well covered, etc.

In the future, I would like to show the possibility of improving forecasting accuracy with some ensemble methods, even with an ensemble of “weak” models such as ETS.

**leave a comment**for the author, please follow the link and comment on their blog:

**Peter Laurinec**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.