# Introducing Modeltime: Tidy Time Series Forecasting using Tidymodels

**business-science.io**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m beyond excited to introduce `modeltime`

, a new time series forecasting package designed to speed up model evaluation, selection, and forecasting. `modeltime`

does this by integrating the `tidymodels`

machine learning ecosystem of packages into a *streamlined workflow* for `tidyverse`

forecasting. Follow this article to get started with `modeltime`

. If you like what you see, I have an Advanced Time Series Course coming soon (join the waitlist) where you will become a time-series expert for your organization by learning `modeltime`

and `timetk`

.

# Modeltime

The forecasting framework for the tidymodels ecosystem

`modeltime`

is a new package designed for rapidly developing and testing time series models using machine learning models, classical models, and automated models. There are three key benefits:

**Systematic Workflow for Forecasting.**Learn a few key functions like`modeltime_table()`

,`modeltime_calibrate()`

, and`modeltime_refit()`

to develop and train time series models.**Unlocks Tidymodels for Forecasting.**Gain the benefit of all or the`parsnip`

models including`boost_tree()`

(XGBoost, C5.0),`linear_reg()`

(GLMnet, Stan, Linear Regression),`rand_forest()`

(Random Forest), and more**New Time Series Boosted Models**including Boosted ARIMA (`arima_boost()`

) and Boosted Prophet (`prophet_boost()`

) that can improve accuracy by applying XGBoost model to the errors

# Getting Started

Let’s kick the tires on modeltime

Install `modeltime`

.

Load the following libraries.

# Get Your Data

Forecasting daily bike transactions

We’ll start with a `bike_sharing_daily`

time series data set that includes bike transactions. We’ll simplify the data set to a univariate time series with columns, “date” and “value”.

Next, visualize the dataset with the `plot_time_series()`

function. Toggle `.interactive = TRUE`

to get a plotly interactive plot. `FALSE`

returns a ggplot2 static plot.

# Train / Test

Split your time series into training and testing sets

Next, use `time_series_split()`

to make a train/test set.

- Setting
`assess = "3 months"`

tells the function to use the last 3-months of data as the testing set. - Setting
`cumulative = TRUE`

tells the sampling to use all of the prior data as the training set.

Next, visualize the train/test split.

`tk_time_series_cv_plan()`

: Converts the splits object to a data frame`plot_time_series_cv_plan()`

: Plots the time series sampling data using the “date” and “value” columns.

# Modeling

This is **exciting.**

Now for the fun part! Let’s make some models using functions from `modeltime`

and `parsnip`

.

## Automatic Models

Automatic models are generally modeling approaches that have been automated. This includes “Auto ARIMA” and “Auto ETS” functions from `forecast`

and the “Prophet” algorithm from `prophet`

. These algorithms have been integrated into `modeltime`

. The process is simple to set up:

**Model Spec:**Use a specification function (e.g.`arima_reg()`

,`prophet_reg()`

) to initialize the algorithm and key parameters**Engine:**Set an engine using one of the engines available for the Model Spec.**Fit Model**: Fit the model to the training data

Let’s make several models to see this process in action.

### Auto ARIMA

Here’s the basic Auto Arima Model fitting process.

**Model Spec:**<– This sets up your general model algorithm and key parameters`arima_reg()`

**Set Engine:**<– This selects the specific package-function to use and you can add any function-level arguments here.`set_engine("auto_arima")`

**Fit Model:**<– All modeltime models require a date column to be a regressor.`fit(value ~ date, training(splits))`

### Prophet

Prophet is specified just like Auto ARIMA. Note that I’ve changed to `prophet_reg()`

, and I’m passing an engine-specific parameter `yearly.seasonality = TRUE`

using `set_engine()`

.

## Machine Learning Models

Machine learning models are more complex than the automated models. This complexity typically requires a ** workflow** (sometimes called a

*pipeline*in other languages). The general process goes like this:

**Create Preprocessing Recipe****Create Model Specifications****Use Workflow to combine Model Spec and Preprocessing, and Fit Model**

### Preprocessing Recipe

First, I’ll create a preprocessing recipe using `recipe()`

and adding time series steps. The process uses the “date” column to create 45 new features that I’d like to model. These include time-series signature features and fourier series.

With a recipe in-hand, we can set up our machine learning pipelines.

### Elastic Net

Making an Elastic NET model is easy to do. Just set up your model spec using `linear_reg()`

and `set_engine("glmnet")`

. Note that we have not fitted the model yet (as we did in previous steps).

Next, make a fitted workflow:

**Start**with a`workflow()`

**Add a Model Spec:**`add_model(model_spec_glmnet)`

**Add Preprocessing:**`add_recipe(recipe_spec %>% step_rm(date))`

<– Note that I’m removing the “date” column since Machine Learning algorithms don’t typically know how to deal with date or date-time features**Fit the Workflow**:`fit(training(splits))`

### Random Forest

We can fit a Random Forest using a similar process as the Elastic Net.

## New Hybrid Models

I’ve included several hybrid models (e.g. `arima_boost()`

and `prophet_boost()`

) that combine both automated algorithms with machine learning. I’ll showcase `prophet_boost()`

next!

### Prophet Boost

The ** Prophet Boost algorithm** combines Prophet with XGBoost to get the best of both worlds (i.e. Prophet Automation + Machine Learning). The algorithm works by:

- First modeling the univariate series using Prophet
- Using regressors supplied via the preprocessing recipe (remember our recipe generated 45 new features), and regressing the Prophet Residuals with the XGBoost model

We can set the model up using a workflow just like with the machine learning algorithms.

# The Modeltime Workflow

Speed up model evaluation and selection with modeltime

**The modeltime workflow** is designed to speed up model evaluation and selection. Now that we have several time series models, let’s analyze them and forecast the future with the

`modeltime`

workflow.## Modeltime Table

**The Modeltime Table** organizes the models with IDs and creates generic descriptions to help us keep track of our models. Let’s add the models to a `modeltime_table()`

.

## Calibration

**Model Calibration** is used to quantify error and estimate confidence intervals. We’ll perform model calibration on the out-of-sample data (aka. the Testing Set) with the `modeltime_calibrate()`

function. Two new columns are generated (“.type” and “.calibration_data”), the most important of which is the “.calibration_data”. This includes the actual values, fitted values, and residuals for the testing set.

### Forecast (Testing Set)

With calibrated data, we can visualize the testing predictions (forecast).

- Use
`modeltime_forecast()`

to generate the forecast data for the testing set as a tibble. - Use
`plot_modeltime_forecast()`

to visualize the results in interactive and static plot formats.

### Accuracy (Testing Set)

Next, calculate the testing accuracy to compare the models.

- Use
`modeltime_accuracy()`

to generate the out-of-sample accuracy metrics as a tibble. - Use
`table_modeltime_accuracy()`

to generate interactive and static

Accuracy Table | ||||||||
---|---|---|---|---|---|---|---|---|

.model_id | .model_desc | .type | mae | mape | mase | smape | rmse | rsq |

1 | ARIMA(0,1,3) WITH DRIFT | Test | 2540.11 | 474.89 | 2.74 | 46.00 | 3188.09 | 0.39 |

2 | PROPHET | Test | 1221.18 | 365.13 | 1.32 | 28.68 | 1764.93 | 0.44 |

3 | GLMNET | Test | 1197.06 | 340.57 | 1.29 | 28.44 | 1650.87 | 0.49 |

4 | RANDOMFOREST | Test | 1338.15 | 335.52 | 1.45 | 30.63 | 1855.21 | 0.46 |

5 | PROPHET W/ XGBOOST ERRORS | Test | 1189.28 | 332.44 | 1.28 | 28.48 | 1644.25 | 0.55 |

### Analyze Results

From the accuracy measures and forecast results, we see that:

- Auto ARIMA model is not a good fit for this data.
- The best model is Prophet + XGBoost

Let’s exclude the Auto ARIMA from our final model, then make future forecasts with the remaining models.

## Refit and Forecast Forward

**Refitting** is a best-practice before forecasting the future.

`modeltime_refit()`

: We re-train on full data (`bike_transactions_tbl`

)`modeltime_forecast()`

: For models that only depend on the “date” feature, we can use`h`

(horizon) to forecast forward. Setting`h = "12 months"`

forecasts then next 12-months of data.

# It gets better

You’ve just scratched the surface, here’s what’s coming…

The `modeltime`

package functionality is much more feature-rich than what we’ve covered here (I couldn’t possibly cover everything in this post). 😀

Here’s what I didn’t cover:

**Feature engineering:**The art of time series analysis is feature engineering. Modeltime works with cutting-edge time-series preprocessing tools including those in`recipes`

and`timetk`

packages.**Hyper parameter tuning:**ARIMA models and Machine Learning models can be tuned. There’s a right and a wrong way (and it’s not the same for both types).**Scalability:**Training multiple time series groups and automation is a huge need area in organizations. You need to know how to scale your analyses to thousands of time series.**Strengths and weaknesses:**Did you know certain machine learning models are better for trend, seasonality, but not both? Why is ARIMA way better for certain datasets? When will Random Forest and XGBoost fail?**Advanced machine learning and deep learning:**Recurrent Neural Networks (RRNs) have been crushing time series competitions. Will they work for business data? How can you implement them?

I teach each of these techniques and strategies so you **become the time series expert for your organization.** Here’s how. 👇

## Advanced Time Series Course

Become the times series domain expert in your organization.

Make sure you’re notified when my new ** Advanced Time Series Forecasting in R course** comes out. You’ll learn

`timetk`

and `modeltime`

plus the most powerful time series forecasting techiniques available. Become the times series domain expert in your organization.👉 **Get notified here: Advanced Time Series Course.**

You will learn:

- Time Series Preprocessing, Noise Reduction, & Anomaly Detection
- Feature engineering using lagged variables & external regressors
- Hyperparameter tuning
- Time series cross-validation
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- NEW – Deep Learning with RNNs (Competition Winner)
- and more.

Signup for the Time Series Course waitlist

# Future Work

I’m just getting started with `modeltime`

. The main functionality should not change so you can begin using. Let me know of any issues via GitHub. Regarding future work, here’s a short list of what’s coming over the next few months.

### Ensembles and Model Stacking

**A top priority on the software roadmap is to include model ensembling**, various techniques for combining models to improve forecast results. The plan is to collaborate with the `tidymodels`

team to develop ensembling tools.

### More Time Series Algorithms

It’s critical to have a diverse set of algorithms included in `modeltime`

or as extensions to `modeltime`

because this improves the speed of experimentation, model selections, and moving into production. To support extensibility:

- I have Model Roadmap here for additional models.
- I also have a vignette with instructions to help developers extend
`modeltime`

, creating R packages that leverage the forecasting workflow.

Comment on GitHub Issue #5 to let me know what you would like to see or if you have plans to extend `modeltime`

.

### Improvements

**I have several improvements forthcoming.** Probably the most important of which is the confidence interval calculations. I plan to use the method used by `earth::earth()`

, which calculates prediction intervals by regressing the absolute errors vs the predictions. This should provide better approximation of forecast confidence.

# Modeltime Resources

- Modeltime Documentation – Learn about
`modeltime`

workflow and which models have been included - Modeltime GitHub Page – Give it a Star if you like it!
- Timetk Documentation – Data wrangling, visualization, and preprocessing for time series.
- Tidymodels.org – The
`tidymodels`

framework is a collection of packages for modeling and machine learning using`tidyverse`

principles.

# Have questions about modeltime?

Make a comment in the chat below. 👇

And, if you plan on using `modeltime`

for your business, it’s a no-brainer – Join my Time Series Course Waitlist (It’s coming, it’s really insane).

**leave a comment**for the author, please follow the link and comment on their blog:

**business-science.io**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.