Introducing Modeltime: Tidy Time Series Forecasting using Tidymodels
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m beyond excited to introduce
modeltime, a new time series forecasting package designed to speed up model evaluation, selection, and forecasting.
modeltime does this by integrating the
tidymodels machine learning ecosystem of packages into a streamlined workflow for
tidyverse forecasting. Follow this article to get started with
modeltime. If you like what you see, I have an Advanced Time Series Course coming soon (join the waitlist) where you will become a time-series expert for your organization by learning
The forecasting framework for the tidymodels ecosystem
modeltime is a new package designed for rapidly developing and testing time series models using machine learning models, classical models, and automated models. There are three key benefits:
Systematic Workflow for Forecasting. Learn a few key functions like
modeltime_refit()to develop and train time series models.
Unlocks Tidymodels for Forecasting. Gain the benefit of all or the
linear_reg()(GLMnet, Stan, Linear Regression),
rand_forest()(Random Forest), and more
New Time Series Boosted Models including Boosted ARIMA (
arima_boost()) and Boosted Prophet (
prophet_boost()) that can improve accuracy by applying XGBoost model to the errors
Let’s kick the tires on modeltime
Load the following libraries.
Get Your Data
Forecasting daily bike transactions
We’ll start with a
bike_sharing_daily time series data set that includes bike transactions. We’ll simplify the data set to a univariate time series with columns, “date” and “value”.
Next, visualize the dataset with the
plot_time_series() function. Toggle
.interactive = TRUE to get a plotly interactive plot.
FALSE returns a ggplot2 static plot.
Train / Test
Split your time series into training and testing sets
time_series_split() to make a train/test set.
assess = "3 months"tells the function to use the last 3-months of data as the testing set.
cumulative = TRUEtells the sampling to use all of the prior data as the training set.
Next, visualize the train/test split.
tk_time_series_cv_plan(): Converts the splits object to a data frame
plot_time_series_cv_plan(): Plots the time series sampling data using the “date” and “value” columns.
This is exciting.
Now for the fun part! Let’s make some models using functions from
Automatic models are generally modeling approaches that have been automated. This includes “Auto ARIMA” and “Auto ETS” functions from
forecast and the “Prophet” algorithm from
prophet. These algorithms have been integrated into
modeltime. The process is simple to set up:
- Model Spec: Use a specification function (e.g.
prophet_reg()) to initialize the algorithm and key parameters
- Engine: Set an engine using one of the engines available for the Model Spec.
- Fit Model: Fit the model to the training data
Let’s make several models to see this process in action.
Here’s the basic Auto Arima Model fitting process.
- Model Spec:
arima_reg()<– This sets up your general model algorithm and key parameters
- Set Engine:
set_engine("auto_arima")<– This selects the specific package-function to use and you can add any function-level arguments here.
- Fit Model:
fit(value ~ date, training(splits))<– All modeltime models require a date column to be a regressor.
Prophet is specified just like Auto ARIMA. Note that I’ve changed to
prophet_reg(), and I’m passing an engine-specific parameter
yearly.seasonality = TRUE using
Machine Learning Models
Machine learning models are more complex than the automated models. This complexity typically requires a workflow (sometimes called a pipeline in other languages). The general process goes like this:
- Create Preprocessing Recipe
- Create Model Specifications
- Use Workflow to combine Model Spec and Preprocessing, and Fit Model
First, I’ll create a preprocessing recipe using
recipe() and adding time series steps. The process uses the “date” column to create 45 new features that I’d like to model. These include time-series signature features and fourier series.
With a recipe in-hand, we can set up our machine learning pipelines.
Making an Elastic NET model is easy to do. Just set up your model spec using
set_engine("glmnet"). Note that we have not fitted the model yet (as we did in previous steps).
Next, make a fitted workflow:
- Start with a
- Add a Model Spec:
- Add Preprocessing:
add_recipe(recipe_spec %>% step_rm(date))<– Note that I’m removing the “date” column since Machine Learning algorithms don’t typically know how to deal with date or date-time features
- Fit the Workflow:
We can fit a Random Forest using a similar process as the Elastic Net.
New Hybrid Models
I’ve included several hybrid models (e.g.
prophet_boost()) that combine both automated algorithms with machine learning. I’ll showcase
The Prophet Boost algorithm combines Prophet with XGBoost to get the best of both worlds (i.e. Prophet Automation + Machine Learning). The algorithm works by:
- First modeling the univariate series using Prophet
- Using regressors supplied via the preprocessing recipe (remember our recipe generated 45 new features), and regressing the Prophet Residuals with the XGBoost model
We can set the model up using a workflow just like with the machine learning algorithms.
The Modeltime Workflow
Speed up model evaluation and selection with modeltime
modeltime workflow is designed to speed up model evaluation and selection. Now that we have several time series models, let’s analyze them and forecast the future with the
The Modeltime Table organizes the models with IDs and creates generic descriptions to help us keep track of our models. Let’s add the models to a
Model Calibration is used to quantify error and estimate confidence intervals. We’ll perform model calibration on the out-of-sample data (aka. the Testing Set) with the
modeltime_calibrate() function. Two new columns are generated (“.type” and “.calibration_data”), the most important of which is the “.calibration_data”. This includes the actual values, fitted values, and residuals for the testing set.
Forecast (Testing Set)
With calibrated data, we can visualize the testing predictions (forecast).
modeltime_forecast()to generate the forecast data for the testing set as a tibble.
plot_modeltime_forecast()to visualize the results in interactive and static plot formats.
Accuracy (Testing Set)
Next, calculate the testing accuracy to compare the models.
modeltime_accuracy()to generate the out-of-sample accuracy metrics as a tibble.
table_modeltime_accuracy()to generate interactive and static
|1||ARIMA(0,1,3) WITH DRIFT||Test||2540.11||474.89||2.74||46.00||3188.09||0.39|
|5||PROPHET W/ XGBOOST ERRORS||Test||1189.28||332.44||1.28||28.48||1644.25||0.55|
From the accuracy measures and forecast results, we see that:
- Auto ARIMA model is not a good fit for this data.
- The best model is Prophet + XGBoost
Let’s exclude the Auto ARIMA from our final model, then make future forecasts with the remaining models.
Refit and Forecast Forward
Refitting is a best-practice before forecasting the future.
modeltime_refit(): We re-train on full data (
modeltime_forecast(): For models that only depend on the “date” feature, we can use
h(horizon) to forecast forward. Setting
h = "12 months"forecasts then next 12-months of data.
It gets better
You’ve just scratched the surface, here’s what’s coming…
modeltime package functionality is much more feature-rich than what we’ve covered here (I couldn’t possibly cover everything in this post). ????
Here’s what I didn’t cover:
Feature engineering: The art of time series analysis is feature engineering. Modeltime works with cutting-edge time-series preprocessing tools including those in
Hyper parameter tuning: ARIMA models and Machine Learning models can be tuned. There’s a right and a wrong way (and it’s not the same for both types).
Scalability: Training multiple time series groups and automation is a huge need area in organizations. You need to know how to scale your analyses to thousands of time series.
Strengths and weaknesses: Did you know certain machine learning models are better for trend, seasonality, but not both? Why is ARIMA way better for certain datasets? When will Random Forest and XGBoost fail?
Advanced machine learning and deep learning: Recurrent Neural Networks (RRNs) have been crushing time series competitions. Will they work for business data? How can you implement them?
I teach each of these techniques and strategies so you become the time series expert for your organization. Here’s how. ????
Advanced Time Series Course
Become the times series domain expert in your organization.
Make sure you’re notified when my new Advanced Time Series Forecasting in R course comes out. You’ll learn
modeltime plus the most powerful time series forecasting techiniques available. Become the times series domain expert in your organization.
???? Get notified here: Advanced Time Series Course.
You will learn:
- Time Series Preprocessing, Noise Reduction, & Anomaly Detection
- Feature engineering using lagged variables & external regressors
- Hyperparameter tuning
- Time series cross-validation
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- NEW - Deep Learning with RNNs (Competition Winner)
- and more.
Signup for the Time Series Course waitlist
I’m just getting started with
modeltime. The main functionality should not change so you can begin using. Let me know of any issues via GitHub. Regarding future work, here’s a short list of what’s coming over the next few months.
Ensembles and Model Stacking
A top priority on the software roadmap is to include model ensembling, various techniques for combining models to improve forecast results. The plan is to collaborate with the
tidymodels team to develop ensembling tools.
More Time Series Algorithms
It’s critical to have a diverse set of algorithms included in
modeltime or as extensions to
modeltime because this improves the speed of experimentation, model selections, and moving into production. To support extensibility:
- I have Model Roadmap here for additional models.
- I also have a vignette with instructions to help developers extend
modeltime, creating R packages that leverage the forecasting workflow.
Comment on GitHub Issue #5 to let me know what you would like to see or if you have plans to extend
I have several improvements forthcoming. Probably the most important of which is the confidence interval calculations. I plan to use the method used by
earth::earth(), which calculates prediction intervals by regressing the absolute errors vs the predictions. This should provide better approximation of forecast confidence.
- Modeltime Documentation - Learn about
modeltimeworkflow and which models have been included
- Modeltime GitHub Page - Give it a Star if you like it!
- Timetk Documentation - Data wrangling, visualization, and preprocessing for time series.
- Tidymodels.org - The
tidymodelsframework is a collection of packages for modeling and machine learning using
Have questions about modeltime?
Make a comment in the chat below. ????
And, if you plan on using
modeltime for your business, it’s a no-brainer - Join my Time Series Course Waitlist (It’s coming, it’s really insane).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.