Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

tidyquant, version 0.2.0, is now available on CRAN. If your not already familiar, tidyquant integrates the best quantitative resources for collecting and analyzing quantitative data, xts, zoo, quantmod and TTR, with the tidy data infrastructure of the tidyverse allowing for seamless interaction between each. I’ll briefly touch on some of the updates. The package is open source, and you can view the code on the tidyquant github page.

# Who Will Benefit?

The tidyquant package was developed with two people in mind:

1. Financial Engineers: These individuals systematically analyze financial securities (typically stocks) implementing technical trading rules such as MACD, Bollinger Bands, Moving Averages, etc to determine buy and sell signals in an automated way. Charting and implementing modelling algorithms are highly important.

2. Financial / Business Analysts: These individuals systematically analyze financial securities, financial statements, key ratios such as valuation (e.g. price to earnings multiples), financial health (e.g. current ratio), efficiency (e.g. inventory turnover). Getting financial and key ratio data is highly important along with charting and to a lesser degree modelling.

For the financial engineer, the package is designed to integrate specialty financial functions within the tidyverse so the user doesn’t need to jump back and forth between tibbles (tidy data frames) / data frames and xts / zoo time-series objects. Methods like tq_transform() and tq_mutate() exist to apply the various xts, zoo, quantmod, and TTR functions to data frames, so you never need to leave the tidyverse. Further, if the user needs to switch object classes, coercion functions exist to easily convert (see as_tibble() for converting xts to tibbles / data frames, and as_xts() for converting data frames to xts).

For the financial analyst, the package is designed to enable retrieving key financial data fast and analyzing financial data easy and efficient. The core function, tq_get(), has the get argument that can be set to:

• Stock Index: Retrieve a list of stock symbols for an entire index such as the S&P500 with tq_get("SP500", get = "stock.index"). 18 indexes are available.

• Stock Prices: Retrieve the stock prices for an individual stock such as Apple with tq_get("AAPL", get = "stock.prices").

• Financial Statements: Retrieve income statements, balance sheets, and cash flow statements for both annual and quarterly periods for an individual stock, tq_get("AAPL", get = "financials").

• Key Ratios: Retrieve 10-years of historical key ratios (89 total available) for an individual stock, tq_get("AAPL", get = "key.ratios").

Most importantly, tidyquant is designed to work in the tidyverse. This means users can use dplyr and tidyr verbs to slice and dice data and purrr to map functions at scale. This enables new capabilities for both financial engineers and analysts. Instead of analyzing one stock at a time, you can now analyze as many stocks as you want at the same time and systematically compare each. See the S&P500 and the more advanced Russell 2000 posts for tutorials on mapping functions to stock lists.

Ok, enough about the benefits. You can read more about them in the vignette. Let’s discuss the updates, and I’ll go through some examples of the new functionality.

1. Key ratios from Morningstar: Users can now get 89 different key ratios that span 10 years historically. This is great for users that want to know how EPS, P/E, and even financials have changed over time. The source is Morningstar.

2. zoo integration: The rollapply functions from the zoo package are now fully integrated with tq_transform and tq_mutate. This means you can calculate rolling averages, maximums, medians, and whatever else your heart desires.

3. Making things more intuitive and hassle-free: These are small tweaks. The transform and mutate function arguments have changed slightly. The x_fun argument has been replaced with the more intuitive name ohlc_fun, so users know to enter a OHLC function such as Op to select the open price of stock prices. The .x and .y are replaced with x and y, which make more sense and don’t interfere with mapping functions in purrr.

Now, let’s go through some examples.

# Prerequisites

First, update tidyquant version 0.2.0. Note that you will need the development version 0.2.0.9000 for this post.

Next, load tidyquant.

I also recommend the open-source RStudio IDE, which makes R Programming easy and efficient.

# Example 1: Getting and Visualizing Key Ratios

You will need to download the development version for this example due to an issue with retrieving key ratios from stocks listed on the NYSE exchange. Key ratios are only available for stocks listed on the NASDAQ exchange in 0.2.0. To continue, upgrade to 0.2.0.9000 using devtools::install_github("mdancho84/tidyquant") to get the latest development version.

Let’s say we want to compare the valuation over time using the price to earnings (P/E) multiple. This is often done when comparing several companies in the same industry to determine those that may be below normal valuation (i.e. the price may be at a discount to historical trends and to peers).

Hypothetically, we’ll select some big banks to visualize the P/E valuation: JP Morgan (JPM), Goldman Sachs (GS), Bank of America (BAC), and Citi Group (C). Before we can visualize all stocks, let’s first get the key ratios for one stock. Use tq_get(), which gets data, and set the get argument to “key.ratios”.

Let’s check out the key ratios by unnesting.

Yikes, there’s 890 rows of data. We can get the unique categories by selecting the “category” column and using the unique function. We first filter to the section we want, “Valuation Ratios”.

We see that “Price to Earnings” is one of the valuation ratios we can get. Let’s filter and plot with ggplot2.

This is great, but we want to evaluate more than one stock. That’s easy to do with dplyr and purrr. First, we’ll make a function to get the P/E ratios using the same procedure as for one stock. Then we’ll map it to scale to many stocks.

Now, let’s scale it to a tibble of stocks.

Now that we have a nested tibble of P/E ratios, we can use the same technique to visualize four stocks as with one stock. We’ll unnest the list to get a single level tibble, then plot using ggplot2 tacking on a facet wrap to split the plots by stock.

We now have the price to earnings ratio visualization for the four bank stocks. We can see how the valuation of each stock compares historically and against its peers. Just a few observations:

• GS has the highest current valuation at almost 15X earnings. JPM, C, and BAC are all priced closer to 10X earnings.
• BAC is missing some values, which were cut off by the y-limits. This happened after the financial crisis, which may be a red flag since earnings were impacted more than peers.
• C had negative PE multiples in 2009 and 2010. This was the result of the financial crisis. Again, this may be a red flag.

The P/E multiple is just one of the 89 key ratios that can be used to evaluate stocks that are now available using tq_get(x, get = "key.ratios").

# Example 2: Taking the New Zoo Integration for a Spin

The rollapply functions from the zoo package are useful in calculating rolling averages, medians, maximums, etc, which are integral in separating the trend from the noise from time-series. One common technique is use simple moving averages to determine the crossover (which was discussed in my last post on tidyquant). A potential issue is that an average is more susceptible to outliers. Instead of using averages, let’s use the zoo functions to get the 15-day and 50-day rolling medians, which are more resistent to noise.

First, we get the past year of stock prices for AAPL using tq_get(get = "stock.prices", from = today - years(1)).

Next, we use tq_mutate() to add the 15-day and 50-day rolling medians. The first two arguments are ohlc_fun = Cl, which selects the closing price using quantmod OHLC notation, and mutate_fun = rollapply, which sends the closing price to the rollapply function. The next arguments, width and FUN are arguments that are passed to the rollapply function. Width is the number of periods to take the median, and FUN is the function we intend to apply (i.e. median). The workflow is as follows:

Two new columns, rollapply and rollapply.1, were added to the tibble. We rename these to be more descriptive. The next part is the same visualization code used in the last post. Essentially we gather the prices we wish to visualize so they are in one long tibble with two columns, “type” (close, median.15, and median.50) and “value”. We color each line by “type” using the ggplot aesthetics.

And, we’re done. We now have an alternative to the SMA that is more resistant to changes caused by outliers.

# Conclusion

The tidyquant package is a useful tool for both financial engineers and financial analysts, with tools to collect, analyze, visualize and model financial data.

# Recap

You should now have a good understanding of the benefits and new features of the tidyquant package. We addressed some of the benefits that financial engineers and analysts can get from using the package. We discussed new features including key ratios and the zoo integration. Eighty nine key ratios are now available using tq_get(). The zoo rollapply() function can be used with tq_mutate() and tq_transform(). This example just scratches the surface of the power of tidyquant. See the vignette for a detailed discussion on each of the tidyquant features.

1. tidyquant Vignette: This tutorial just scratches the surface of tidyquant. The vignette explains much, much more!

2. R for Data Science: A free book that thoroughly covers the tidyverse packages.

3. Quantmod Website: Covers many of the quantmod functions. Also, see the quantmod vignette.

4. Extensible Time-Series Website: Covers many of the xts functions. Also, see the xts vignette.

5. TTR Vignette: Covers each of the TTR functions.