tidyquant, version 0.2.0, is now available on CRAN. If your not already familiar,
tidyquant integrates the best quantitative resources for collecting and analyzing quantitative data,
TTR, with the tidy data infrastructure of the
tidyverse allowing for seamless interaction between each. I’ll briefly touch on some of the updates. The package is open source, and you can view the code on the tidyquant github page.
Table of Contents
- Who Will Benefit?
- Example 1: Getting and Visualizing Key Ratios
- Example 2: Taking the New Zoo Integration for a Spin
- Further Reading
tidyquant package was developed with two people in mind:
Financial Engineers: These individuals systematically analyze financial securities (typically stocks) implementing technical trading rules such as MACD, Bollinger Bands, Moving Averages, etc to determine buy and sell signals in an automated way. Charting and implementing modelling algorithms are highly important.
Financial / Business Analysts: These individuals systematically analyze financial securities, financial statements, key ratios such as valuation (e.g. price to earnings multiples), financial health (e.g. current ratio), efficiency (e.g. inventory turnover). Getting financial and key ratio data is highly important along with charting and to a lesser degree modelling.
For the financial engineer, the package is designed to integrate specialty financial functions within the
tidyverse so the user doesn’t need to jump back and forth between tibbles (tidy data frames) / data frames and xts / zoo time-series objects. Methods like
tq_mutate() exist to apply the various xts, zoo, quantmod, and TTR functions to data frames, so you never need to leave the tidyverse. Further, if the user needs to switch object classes, coercion functions exist to easily convert (see
as_tibble() for converting xts to tibbles / data frames, and
as_xts() for converting data frames to xts).
For the financial analyst, the package is designed to enable retrieving key financial data fast and analyzing financial data easy and efficient. The core function,
tq_get(), has the
get argument that can be set to:
Stock Index: Retrieve a list of stock symbols for an entire index such as the S&P500 with
tq_get("SP500", get = "stock.index"). 18 indexes are available.
Stock Prices: Retrieve the stock prices for an individual stock such as Apple with
tq_get("AAPL", get = "stock.prices").
Financial Statements: Retrieve income statements, balance sheets, and cash flow statements for both annual and quarterly periods for an individual stock,
tq_get("AAPL", get = "financials").
Key Ratios: Retrieve 10-years of historical key ratios (89 total available) for an individual stock,
tq_get("AAPL", get = "key.ratios").
tidyquant is designed to work in the
tidyverse. This means users can use
tidyr verbs to slice and dice data and
purrr to map functions at scale. This enables new capabilities for both financial engineers and analysts. Instead of analyzing one stock at a time, you can now analyze as many stocks as you want at the same time and systematically compare each. See the S&P500 and the more advanced Russell 2000 posts for tutorials on mapping functions to stock lists.
Ok, enough about the benefits. You can read more about them in the vignette. Let’s discuss the updates, and I’ll go through some examples of the new functionality.
The major updates are:
Key ratios from Morningstar: Users can now get 89 different key ratios that span 10 years historically. This is great for users that want to know how EPS, P/E, and even financials have changed over time. The source is Morningstar.
zoo integration: The
rollapplyfunctions from the
zoopackage are now fully integrated with
tq_mutate. This means you can calculate rolling averages, maximums, medians, and whatever else your heart desires.
Making things more intuitive and hassle-free: These are small tweaks. The transform and mutate function arguments have changed slightly. The
x_funargument has been replaced with the more intuitive name
ohlc_fun, so users know to enter a OHLC function such as Op to select the open price of stock prices. The
.yare replaced with
y, which make more sense and don’t interfere with mapping functions in
Now, let’s go through some examples.
tidyquant version 0.2.0. Note that you will need the development version 0.2.0.9000 for this post.
I also recommend the open-source RStudio IDE, which makes R Programming easy and efficient.
You will need to download the development version for this example due to an issue with retrieving key ratios from stocks listed on the NYSE exchange. Key ratios are only available for stocks listed on the NASDAQ exchange in 0.2.0. To continue, upgrade to 0.2.0.9000 using
devtools::install_github("mdancho84/tidyquant") to get the latest development version.
Let’s say we want to compare the valuation over time using the price to earnings (P/E) multiple. This is often done when comparing several companies in the same industry to determine those that may be below normal valuation (i.e. the price may be at a discount to historical trends and to peers).
Hypothetically, we’ll select some big banks to visualize the P/E valuation: JP Morgan (JPM), Goldman Sachs (GS), Bank of America (BAC), and Citi Group (C). Before we can visualize all stocks, let’s first get the key ratios for one stock. Use
tq_get(), which gets data, and set the
get argument to “key.ratios”.
Let’s check out the key ratios by unnesting.
Yikes, there’s 890 rows of data. We can get the unique categories by selecting the “category” column and using the
unique function. We first
filter to the section we want, “Valuation Ratios”.
We see that “Price to Earnings” is one of the valuation ratios we can get. Let’s
filter and plot with
This is great, but we want to evaluate more than one stock. That’s easy to do with
purrr. First, we’ll make a function to get the P/E ratios using the same procedure as for one stock. Then we’ll map it to scale to many stocks.
Now, let’s scale it to a tibble of stocks.
Now that we have a nested tibble of P/E ratios, we can use the same technique to visualize four stocks as with one stock. We’ll unnest the list to get a single level tibble, then plot using
ggplot2 tacking on a facet wrap to split the plots by stock.
We now have the price to earnings ratio visualization for the four bank stocks. We can see how the valuation of each stock compares historically and against its peers. Just a few observations:
- GS has the highest current valuation at almost 15X earnings. JPM, C, and BAC are all priced closer to 10X earnings.
- BAC is missing some values, which were cut off by the y-limits. This happened after the financial crisis, which may be a red flag since earnings were impacted more than peers.
- C had negative PE multiples in 2009 and 2010. This was the result of the financial crisis. Again, this may be a red flag.
The P/E multiple is just one of the 89 key ratios that can be used to evaluate stocks that are now available using
tq_get(x, get = "key.ratios").
rollapply functions from the
zoo package are useful in calculating rolling averages, medians, maximums, etc, which are integral in separating the trend from the noise from time-series. One common technique is use simple moving averages to determine the crossover (which was discussed in my last post on
tidyquant). A potential issue is that an average is more susceptible to outliers. Instead of using averages, let’s use the zoo functions to get the 15-day and 50-day rolling medians, which are more resistent to noise.
First, we get the past year of stock prices for AAPL using
tq_get(get = "stock.prices", from = today - years(1)).
Next, we use
tq_mutate() to add the 15-day and 50-day rolling medians. The first two arguments are
ohlc_fun = Cl, which selects the closing price using
quantmod OHLC notation, and
mutate_fun = rollapply, which sends the closing price to the rollapply function. The next arguments,
FUN are arguments that are passed to the
rollapply function. Width is the number of periods to take the median, and FUN is the function we intend to apply (i.e. median). The workflow is as follows:
Two new columns, rollapply and rollapply.1, were added to the tibble. We
rename these to be more descriptive. The next part is the same visualization code used in the last post. Essentially we
gather the prices we wish to visualize so they are in one long tibble with two columns, “type” (close, median.15, and median.50) and “value”. We color each line by “type” using the ggplot aesthetics.
And, we’re done. We now have an alternative to the SMA that is more resistant to changes caused by outliers.
tidyquant package is a useful tool for both financial engineers and financial analysts, with tools to collect, analyze, visualize and model financial data.
You should now have a good understanding of the benefits and new features of the
tidyquant package. We addressed some of the benefits that financial engineers and analysts can get from using the package. We discussed new features including key ratios and the
zoo integration. Eighty nine key ratios are now available using
rollapply() function can be used with
tq_transform(). This example just scratches the surface of the power of
tidyquant. See the vignette for a detailed discussion on each of the
tidyquant Vignette: This tutorial just scratches the surface of
tidyquant. The vignette explains much, much more!
R for Data Science: A free book that thoroughly covers the
TTR Vignette: Covers each of the