Tidy Time Series Analysis, Part 1

In the first part in a series on Tidy Time Series Analysis, we’ll use `tidyquant` to investigate CRAN downloads. You’re probably thinking, “Why tidyquant?” Most people think of `tidyquant` as purely a financial package and rightfully so. However, because of its integration with `xts`, `zoo` and `TTR`, it’s naturally suited for “tidy” time series analysis. In this post, we’ll discuss the the “period apply” functions from the `xts` package, which make it easy to apply functions to time intervals in a “tidy” way using `tq_transmute()`!

An example of the visualization we can create using the period apply functions with `tq_transmute()`:

Libraries Needed

We’ll primarily be using two libraries today.

As you can tell from my laptop stickers, I’m a bit of a `tidyverse` fan. 🙂 The packages are super useful so it’s no wonder why several of these packages rank in the top downloads according to RDocumenation.org’s Leaderboard by DataCamp.

We love the tidyverse!

A good way to inspect the trends in popularity with these packages is to examine the CRAN downloads. So how do we get download data? The `cranlogs` package has a convenient function, `cran_downloads()`, that allows us to retrieve daily downloads of various packages. Getting downloads is as easy as making a vector of the packages we want to analyze and using `cran_downloads()`. I’ve added a date range over the past six months since `tidyquant` has only been in existence since then.

We can easily visualize the “tidyverse” downloads with `ggplot2`.

From the downloads graph, it’s difficult to see what’s going on. It looks like there is some separation in the data (this corresponds to weekends), but overall it’s difficult to separate the trend from the noise. This is the nature of daily data: it tends to be very noisy. The problem tends to get worse with the larger the data set. Fortunately, there’s a bunch of useful time series tools to help us extract trends and to make visualization easier!

Time Series Functions

The `xts`, `zoo`, and `TTR` packages have some great functions that enable working with time series. Today, we’ll focus in on the Period Apply Functions from the `xts` package. The period apply functions are helper functions that enable the application of other functions by common intervals. What “other functions” can be supplied? Any function that returns a numeric vector such as scalars (`mean`, `median`, `sd`, `min`, `max`, etc) or vectors (`quantile`, `summary`, and custom functions) The period apply functions are in the format `apply.[interval]` where [interval] can be daily, weekly, monthly, quarterly, and yearly.

Tidy Implementation of Time Series Functions

We’ll be using the `tq_transmute()` function to apply time series functions in a “tidy” way. The `tq_transmute()` function always returns a new data frame (rather than adding columns to the existing data frame). Hence it’s well suited for aggregation tasks that result in rowwise (or columnwise) dimension changes. It comes with a bunch of integrated financial and time series package integrations. We can see which apply functions will work by investigating the list of available functions returned by `tq_transmute_fun_options()`.

Applying Functions By Period

As we saw in the tidyverse daily download graph above, it can be difficult to understand the trends in daily data just by visualizing the data. It’s often better to apply statistics to subsets of the time series, which can help to remove noise and make it easier to extract / visualize the underlying trends. The period apply functions from `xts` are the perfect answer in these cases.

Suppose we’d like to investigate if our the package downloads are growing. One way to do this is to investigate by aggregating over an interval. Instead of viewing each day, we can view the average daily downloads of each week, which reduces the impact of outliers and reduces the number of data points in the process making it easier to visualize trend.

To perform the weekly aggregation, we will use `tq_transmute()` which applies the non-tidy functions in a “tidy” way. The function we want to use is `apply.weekly()`, which takes the argument `FUN` (the function to be applied weekly) and `...` (additional args that get passed to the `FUN` function). We’ll set `FUN = mean` to apply `mean()` on a weekly interval. Last, we’ll pass the argument `na.rm = TRUE` to remove `NA` values during the calculation.

There’s one problem though, graphing the mean alone doesn’t tell the full story. There’s variability (or volatility) that can also influence trends especially the average, which is highly susceptible to outliers. Next, we’ll see how to go beyond a single statistic.

Custom functions: Weekly aggregation beyond a single statistic

As statisticians, we typically care about more than simply getting the mean. We might be interested in standard deviation, quantiles, and other elements that help to characterize the underlying data. The good news is that we can implement custom functions that return numeric values that describe the data more fully. Let’s test it out by creating a function that returns the following:

• mean
• standard deviation
• min & max
• range for middle 95% (2.5% and 97.5%)
• range for middle 50% (25% and 75%, or Q1 and Q3)
• median

This is actually really easy to do. Our custom function, `custom_stat_fun()`, will only need three functions: `mean`, `sd` and `quantile`. We’ll setup the function to take the arguments `x` (the numeric vector), `na.rm` (arg to remove `NA` values from the statistic calculation), and `...` to pass additional arguments to the `quantile()` function. Here it is:

Let’s test out the custom stat function. Note the format of the return is a named numeric vector. As long as the return is a numeric vector, we can use in the “tidy” aggregation (shown next).

Now for the fun part: “tidy” aggregation. Let’s apply the `custom_stat_fun()` to groups using `tq_transmute()` and the weekly aggregation function `apply.weekly()`. The process is almost identical to the process of applying `mean()` on weekly intervals. The only difference is we also supply the probabilities (`probs`), which gets sent to the `quantile()` function internal to our custom stat function. The output returned is a tidy data frame with each statistic that relates to the data spread.

Like before, the data was sectioned by week, but now we have a number of additional features that can be used to visualize volatility in addition to trend. The trend is visualized by the median and the volatility by the first and third quartile. We can also visually recognize the skew caused by the weekends by the space between the 1st Quartile line and the median points on several of the facets. This is an indicator that there may be a separate group to estimate.

We can also investigate how the mean and standard deviation relate to each other. In general it appears that higher volatility in daily downloads tends to coincide with higher mean daily downloads.

Conclusions

The period apply functions from `xts` can be used to apply aggregations using common time series intervals such as weekly, monthly, quarterly, and yearly. The `tq_transmute()` function from `tidyquant` enables efficient and “tidy” application of the functions. We were able to use the period apply functions to visualize trends and volatility and to expose relationships between statistical measures.

We have a full suite of data science services to supercharge your financial and business performance. How do we do it? Using our network of data science consultants, we pull together the right team to get custom projects done on time, within budget, and of the highest quality. Find out more about our data science services or contact us!

We are growing! Let us know if you are interested in joining our network of data scientist consultants. If you have expertise in Marketing Analytics, Data Science for Business, Financial Analytics, we’d love to talk. Contact us!

Announcements

We have completed the new package, `sweep`, which “tidies” the `forecast` workflow by applying `broom` concepts to the various model functions (`auto.arima()`, `ets()`, etc) and `forecast()` output. You can download from github: `devtools::install_github("business-science/sweep")`. We’ll be requesting addition to CRAN soon!

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...