Site icon R-bloggers

Time Series in 5-Minutes, Part 1: Data Wrangling and Rolling Calculations

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Have 5-minutes? Then let’s learn time series. In this short articles series, I highlight how you can get up to speed quickly on important aspects of time series analysis. Today we are focusing preparing data for timeseries analysis rolling calculations.

Updates

This article has been updated. View the updated Time Series in 5-Minutes article at Business Science.

Time Series in 5-Mintues
Articles in this Series

Rolling Calculations – A fundamental tool in your arsenal

I just released timetk 2.0.0 (read the release announcement). A ton of new functionality has been added. We’ll discuss some of the key pieces in this article series:

???? Register for our blog to get new articles as we release them.

Have 5-Minutes?
Then let’s learn Rolling Calculations

A collection of tools for working with time series in R Time series data wrangling is an essential skill for any forecaster. timetk includes the essential data wrangling tools. In this tutorial:

Additional concepts covered:

Advanced Time Series Course
Become the times series domain expert in your organization.

Make sure you’re notified when my new Advanced Time Series Forecasting in R course comes out. You’ll learn timetk and modeltime plus the most powerful time series forecasting techiniques available. Become the times series domain expert in your organization.

???? Get notified here: Advanced Time Series Course.

< !-- –>


You will learn:

Signup for the Time Series Course waitlist

Let’s Get Started

library(tidyverse)
library(tidyquant) 
library(timetk)

Data

This tutorial will use the FANG dataset:

FANG

The adjusted column contains the adjusted closing prices for each day.

FANG %>%
  group_by(symbol) %>%
  plot_time_series(date, adjusted, .facet_ncol = 2, .interactive = FALSE)

The volume column contains the trade volume (number of times the stock was transacted) for the day.

FANG %>%
  group_by(symbol) %>%
  plot_time_series(date, volume, .facet_ncol = 2, .interactive = FALSE)

Summarize by Time

summarise_by_time() aggregates by a period. It’s great for:

Period Summarization

Objective: Get the total trade volume by quarter

FANG %>%
  group_by(symbol) %>%
  summarise_by_time(
    date, .by = "quarter",
    volume = SUM(volume)
  ) %>%
  plot_time_series(date, volume, .facet_ncol = 2, .interactive = FALSE, .y_intercept = 0)

Period Smoothing

Objective: Get the first value in each month

FANG %>%
  group_by(symbol) %>%
  summarise_by_time(
    date, .by = "month",
    adjusted = FIRST(adjusted)
  ) %>%
  plot_time_series(date, adjusted, .facet_ncol = 2, .interactive = FALSE)

Filter By Time

Used to quickly filter a continuous time range.

Time Range Filtering

Objective: Get the adjusted stock prices in the 3rd quarter of 2013.

FANG %>%
  group_by(symbol) %>%
  filter_by_time(date, "2013-09", "2013") %>%
  plot_time_series(date, adjusted, .facet_ncol = 2, .interactive = FALSE)

Padding Data

Used to fill in (pad) gaps and to go from from low frequency to high frequency. This function uses the awesome padr library for filling and expanding timestamps.

Fill in Gaps

Objective: Make an irregular series regular.

FANG %>%
  group_by(symbol) %>%
  pad_by_time(date, .by = "auto") # Guesses .by = "day"

Low to High Frequency

Objective: Go from Daily to Hourly timestamp intervals for 1 month from the start date. Impute the missing values.

FANG %>%
  group_by(symbol) %>%
  pad_by_time(date, .by = "hour") %>%
  mutate_at(vars(open:adjusted), .funs = ts_impute_vec, period = 1) %>%
  filter_by_time(date, "start", FIRST(date) %+time% "1 month") %>%
  plot_time_series(date, adjusted, .facet_ncol = 2, .interactive = FALSE) 

Sliding (Rolling) Calculations

We have a new function, slidify() that turns any function into a sliding (rolling) window function. It takes concepts from tibbletime::rollify() and it improves them with the R package slider.

Rolling Mean

Objective: Calculate a “centered” simple rolling average with partial window rolling and the start and end windows.

# Make the rolling function
roll_avg_30 <- slidify(.f = AVERAGE, .period = 30, .align = "center", .partial = TRUE)
# Apply the rolling function
FANG %>%
  select(symbol, date, adjusted) %>%
  group_by(symbol) %>%
  # Apply Sliding Function
  mutate(rolling_avg_30 = roll_avg_30(adjusted)) %>%
  pivot_longer(cols = c(adjusted, rolling_avg_30)) %>%
  plot_time_series(date, value, .color_var = name,
                   .facet_ncol = 2, .smooth = FALSE, 
                   .interactive = FALSE)

For simple rolling calculations (rolling average), we can accomplish this operation faster with slidify_vec() – A vectorized rolling function for simple summary rolls (e.g. mean(), sd(), sum(), etc)

FANG %>%
  select(symbol, date, adjusted) %>%
  group_by(symbol) %>%
  # Apply roll apply Function
  mutate(rolling_avg_30 = slidify_vec(adjusted,  ~ AVERAGE(.), 
                                      .period = 30, .partial = TRUE))

Rolling Regression

Objective: Calculate a rolling regression.

# Rolling regressions are easy to implement using `.unlist = FALSE`
lm_roll <- slidify(~ lm(..1 ~ ..2 + ..3), .period = 90, 
                   .unlist = FALSE, .align = "right")
FANG %>%
  select(symbol, date, adjusted, volume) %>%
  group_by(symbol) %>%
  mutate(numeric_date = as.numeric(date)) %>%
  # Apply rolling regression
  mutate(rolling_lm = lm_roll(adjusted, volume, numeric_date)) %>%
  filter(!is.na(rolling_lm))

Advanced Time Series Course
Become the times series domain expert in your organization.

Make sure you’re notified when my new Advanced Time Series Forecasting in R course comes out. You’ll learn timetk and modeltime plus the most powerful time series forecasting techiniques available. Become the times series domain expert in your organization.

???? Get notified here: Advanced Time Series Course.

< !-- –>


You will learn:

Signup for the Time Series Course waitlist


Have questions on using Timetk for time series?

Make a comment in the chat below. ????

And, if you plan on using timetk for your business, it’s a no-brainer – Join my Time Series Course Waitlist (It’s coming, it’s really insane).

To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.