Time Series in 5-Minutes, Part 5: Anomaly Detection

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Have 5-minutes? Then let’s learn time series. In this short articles series, I highlight how you can get up to speed quickly on important aspects of time series analysis. Today we are focusing analyzing anomalies in time series data.

Updates

This article has been updated. View the updated Time Series in 5-Minutes article at Business Science.

Time Series in 5-Mintues
Articles in this Series

I just released timetk 2.0.0 (read the release announcement). A ton of new functionality has been added. We’ll discuss some of the key pieces in this article series:

👉 Register for our blog to get new articles as we release them.

Have 5-Minutes?
Then let’s learn Time Series Anomaly Detection

Anomaly detection is an important part of time series analysis:

  1. Detecting anomalies can signify special events
  2. Cleaning anomalies can improve forecast error

In this short tutorial, we will cover the plot_anomaly_diagnostics() and tk_anomaly_diagnostics() functions for visualizing and automatically detecting anomalies at scale.

Advanced Time Series Course
Become the times series domain expert in your organization.

Make sure you’re notified when my new Advanced Time Series Forecasting in R course comes out. You’ll learn timetk and modeltime plus the most powerful time series forecasting techiniques available. Become the times series domain expert in your organization.

👉 Get notified here: Advanced Time Series Course.

You will learn:

  • Time Series Preprocessing, Noise Reduction, & Anomaly Detection
  • Feature engineering using lagged variables & external regressors
  • Hyperparameter tuning
  • Time series cross-validation
  • Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
  • NEW – Deep Learning with RNNs (Competition Winner)
  • and more.

Signup for the Time Series Course waitlist

Let’s Get Started

First setup the libraries we’ll use:

library(tidyverse)
library(timetk)

Data

This tutorial will use the walmart_sales_weekly dataset:

  • Weekly
  • Sales spikes at various events
walmart_sales_weekly

Data Summary

Automatic Anomaly Detection

To get the data on the anomalies, we use tk_anomaly_diagnostics(), the preprocessing function.

The tk_anomaly_diagnostics() method for anomaly detection implements a 2-step process to detect outliers in time series.

Step 1: Detrend & Remove Seasonality using STL Decomposition

The decomposition separates the “season” and “trend” components from the “observed” values leaving the “remainder” for anomaly detection.

The user can control two parameters: frequency and trend.

  1. .frequency: Adjusts the “season” component that is removed from the “observed” values.
  2. .trend: Adjusts the trend window (t.window parameter from stats::stl() that is used.

The user may supply both .frequency and .trend as time-based durations (e.g. “6 weeks”) or numeric values (e.g. 180) or “auto”, which predetermines the frequency and/or trend based on the scale of the time series using the tk_time_scale_template().

Step 2: Anomaly Detection

Once “trend” and “season” (seasonality) is removed, anomaly detection is performed on the “remainder”. Anomalies are identified, and boundaries (recomposed_l1 and recomposed_l2) are determined.

The Anomaly Detection Method uses an inner quartile range (IQR) of +/-25 the median.

IQR Adjustment, alpha parameter

With the default alpha = 0.05, the limits are established by expanding the 25/75 baseline by an IQR Factor of 3 (3X). The IQR Factor = 0.15 / alpha (hence 3X with alpha = 0.05):

  • To increase the IQR Factor controlling the limits, decrease the alpha, which makes it more difficult to be an outlier.
  • Increase alpha to make it easier to be an outlier.
  • The IQR outlier detection method is used in forecast::tsoutliers().
  • A similar outlier detection method is used by Twitter’s AnomalyDetection package.
  • Both Twitter and Forecast tsoutliers methods have been implemented in Business Science’s anomalize package.
walmart_sales_weekly %>%
  group_by(Store, Dept) %>%
  tk_anomaly_diagnostics(Date, Weekly_Sales)

Anomaly Detection

Anomaly Visualization

Using the plot_anomaly_diagnostics() function, we can interactively detect anomalies at scale.

The plot_anomaly_diagnostics() is a visualtion wrapper for tk_anomaly_diagnostics() group-wise anomaly detection, implementing the 2-step process from above.

walmart_sales_weekly %>%
  group_by(Store, Dept) %>%
  plot_anomaly_diagnostics(Date, Weekly_Sales, .facet_ncol = 2)

Anomaly Diagnostics

Advanced Time Series Course
Become the times series domain expert in your organization.

Make sure you’re notified when my new Advanced Time Series Forecasting in R course comes out. You’ll learn timetk and modeltime plus the most powerful time series forecasting techiniques available. Become the times series domain expert in your organization.

👉 Get notified here: Advanced Time Series Course.

You will learn:

  • Time Series Preprocessing, Noise Reduction, & Anomaly Detection
  • Feature engineering using lagged variables & external regressors
  • Hyperparameter tuning
  • Time series cross-validation
  • Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
  • NEW – Deep Learning with RNNs (Competition Winner)
  • and more.

Signup for the Time Series Course waitlist

Have questions on using Timetk for time series?

Make a comment in the chat below. 👇

And, if you plan on using timetk for your business, it’s a no-brainer – Join my Time Series Course Waitlist (It’s coming, it’s really insane).

To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)