[This article was first published on business-science.io - Articles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We are very excited to announce the initial release of our newest R package,
tibbletime. As evident from the name, tibbletime is built on top of the
tibble package (and more generally on top of the tidyverse) with the main
purpose of being able to create time-aware tibbles through a one-time
specification of an “index” column (a column containing timestamp information). There are a ton of useful time functions that we can now use such as time_filter(), time_summarize(), tmap(), as_period() and time_collapse(). We’ll walk through the basics in this post.
If you like what we do, please follow us on social media to stay up on the latest Business Science news, events and information! As always, we are interested in both expanding our network of data scientists and seeking new clients interested in applying data science to business and finance. If interested, contact us.
Why tibbletime?
The tidyverse has some great packages such as dplyr, tidyr, and purrr that really make data science in R both easier and fun. However, none of these packages specifically address time series. Often R users will find themselves continuously switching to and from xts to leverage the benefits of time-based functions and operations such as grouping, applying functions, summarizing and so on all by time.
Enter tibbletime, a new class that is time-aware. Advantages of this new class are:
The ability to perform compact, time-based subsetting on tibbles.
Quickly summarisize and aggregate results by time period (yearly, monthly, etc).
Change the periodicity of a tibble, meaning from a daily dataset to a monthly or yearly dataset with one function call.
Call functions similar in spirit to the map family from purrr on time based tibbles.
In addition to these new abilities, all functions were designed to support the pipe (%>%) and to work seamlessly with “tidyverse” packages like dplyr, tidyr and purrr. The end result is a new set of tools that make time series in the “tidyverse” much easier and more productive.
Libraries Needed
To get started, load tibbletime. You can download from CRAN or from GitHub.
Creating time-based tibbles
To get started with tibbletime, you’ll use as_tbl_time() to transform
your tibble into a tbl_time object. Of course, you’ll need a column of
dates to use as your index.
Below, we’ll use the FB data set included with the package (Facebook stock
prices).
We start with a tibble. Notice the specification of the index argument as the date column of FB.
Using as_tbl_time(), we convert FB to class tbl_time. It now has the “date” stored as an index.
Inspecting the class, we see it’s now a tbl_time!
With the exception of the “Index: date” in the print statement, the returned object doesn’t look much different… The differences are all under the hood, but they come out when we use our tibbletime functions.
tibbletime functions
There are a number of functions that were designed specifically for tbl_time
objects. Some of them are:
time_filter: Succinctly filter a tbl_time object by date.
time_summarise: Similar to dplyr::summarise() but with the added benefit of being able to summarise by a time period such as “yearly” or “monthly”.
tmap: The family of tmap_* functions transform a tbl_time input by applying a function to each column at a specified time interval.
as_period: Convert a tbl_time object from daily to monthly, from minute data to hourly, and more. This allows the user to easily aggregate data to a less granular level.
time_collapse: When time_collapse() is used, the index of a tbl_time object is altered so that all dates falling inside the specifiedperiod share a common date. This is generally used internally by some of theother functions, but is useful on its own to perform analyses grouped by some time period.
A few examples
Let’s take tibbletime for a spin so we can try out some of the useful time-based functions.
time_filter()
Let’s look at time_filter(), a neat way to slice your dataset with a
compact time formula.
As you can see, you specify a date range using from ~ to. A nice shorthand is
also available. Here’s observations from March 2013 through the end of 2015.
Here’s observations only in the month of March 2013.
time_summarise()
Have you ever wanted to calculate yearly averages for your data? Or quarterly
summary results for your boss? Now, it’s easy with time_summarise()!
Just specify a period, and then calculate your summary results just as you
would with dplyr::summarise(), tibbletime takes care of the rest.
It even works with groups! We’ll check out the FANG data set, which contains sample stock prices for FB, AMZN, NFLX and GOOG.
as_period()
In the xts world, there are nice functions to convert your xts object to a
different periodicity. Nothing like that existed natively
for tibbles outside of using one of our other packages, tidyquant, to call
xts functions indirectly.
With as_period(), that native support now exists and you can convert your
time-based tibble to any of the following periodicities.
"yearly"
"quarterly"
"monthly"
"weekly"
"daily"
"hour"
"minute"
"second"
By default, the first date in that period is used as the new date.
You can use the last date in that period with the side argument.
But remember, you cannot convert to a more granular periodicity. Meaning that if
you have daily data, you cannot convert to hourly data. We can’t make up new
data points for you!
tmap()
The family of tmap functions add a time layer onto the existing
purrr::map family of functions. They map a function over every column in the
data set (besides groups and the index), but also allow you to map at specified
intervals (yearly, monthly, etc). This provides
flexibility at the cost of requiring the use of the slightly
more complex list-columns.
A more useful way to do this is to use the tmap_dfc() function, which
attempts to convert each of those lists inside the list-column into a tibble
by using dplyr::bind_cols(). Combined with unnest() to unroll each of
the tibbles, you can get back a very clean result.
Final thoughts
Mind you this is only v0.0.1. We have a lot of work to do, but we couldn’t
wait any longer to share this. Feel free to kick the tires on tibbletime, and let us know your thoughts. Please submit any comments, issues or bug reports to us on GitHub here. Enjoy!
About Business Science
We have a full suite of data science services to supercharge your financial and business performance. How do we do it? Using our network of data science consultants, we pull together the right team to get custom projects done on time, within budget, and of the highest quality. Find out more about our data science services or contact us!
We are growing! Let us know if you are interested in joining our network of data scientist consultants. If you have expertise in Marketing Analytics, Data Science for Business, Financial Analytics, or Data Science in general, we’d love to talk. Contact us!