Downloading Option Chain Data from Google Finance in R: An Update
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I recently read an article which showed how to download Option Chain data from Google Finance using R. Interestingly, that article appears to be a close adaption of another article which does the same thing using Python.
While playing around with the code from these articles I noticed a couple of things that might benefit from minor tweaks. Before I look at those though, it’s worthwhile pointing out that there already is a function in quantmod for retrieving Option Chain data from Yahoo! Finance. What I am doing here is thus more for my own personal edification (but hopefully you will find it interesting too!).
Background
An Option Chain is just a list of all available options for a particular security spanning a range of expiration dates.
The Code
First we need to load a few packages which facilitate the downloading, parsing and manipulation of the data.
> library(RCurl) > library(jsonlite) > library(plyr)
We’ll be retrieving the data in JSON format. Somewhat disturbingly the JSON data from Google Finance does not appear to be fully compliant with the JSON standards because the keys are not quoted. We’ll use a helper function which will run through the data and insert quotes around each of the keys. The original code for this function looped through a list of key names. This is a little inefficient and would also be problematic if additional keys were introduced. We’ll get around that by using a different approach which avoids stipulating key names.
> fixJSON <- function(json){
+   gsub('([^,{:]+):', '"\1":', json)
+ }
To make the download function more concise we'll also define two URL templates.
> URL1 = 'http://www.google.com/finance/option_chain?q=%s&output=json' > URL2 = 'http://www.google.com/finance/option_chain?q=%s&output=json&expy=%d&expm=%d&expd=%d'
And finally the download function itself, which proceeds through the following steps for a specified ticker symbol:
- downloads summary data;
- extracts expiration dates from the summary data and downloads the options data for each of those dates;
- concatenates these data into a single structure, neatens up the column names and selects a subset.
> getOptionQuotes <- function(symbol){
+   url = sprintf(URL1, symbol)
+   #
+   chain = fromJSON(fixJSON(getURL(url)))
+   #
+   options = mlply(chain$expirations, function(y, m, d) {
+     url = sprintf(URL2, symbol, y, m, d)
+     expiry = fromJSON(fixJSON(getURL(url)))
+     #
+     expiry$calls$type = "Call"
+     expiry$puts$type  = "Put"
+     #
+     prices = rbind(expiry$calls, expiry$puts)
+     #
+     prices$expiry = sprintf("%4d-%02d-%02d", y, m, d)
+     prices$underlying.price = expiry$underlying_price
+     #
+     prices
+   })
+   #
+   options = cbind(data.frame(symbol), rbind.fill(options))
+   #
+   names(options)1 = c("price", "bid", "ask", "open.interest")
+   #
+   for (col in c("strike", "price", "bid", "ask")) options[, col] = as.numeric(options[, col])
+   options[, "open.interest"] = suppressWarnings(as.integer(options[, "open.interest"]))
+   #
+   options[, c(1, 16, 15, 6, 10, 11, 17, 14, 12)]
+ }
Results
Let's give it a whirl. (The data below were retrived on Saturday 10 January 2015).
> AAPL = getOptionQuotes("AAPL")
> nrow(AAPL)
[1] 1442
This is what the resulting data look like, with all available expiration dates consolidated into a single table:
> head(AAPL)
  symbol type     expiry price   bid   ask underlying.price strike open.interest
1   AAPL Call 2015-01-17 82.74 84.00 84.35           112.01  27.86           505
2   AAPL Call 2015-01-17 83.75 83.20 83.70           112.01  28.57          1059
3   AAPL Call 2015-01-17 84.75 82.60 82.90           112.01  29.29            13
4   AAPL Call 2015-01-17 81.20 81.80 82.40           112.01  30.00            29
5   AAPL Call 2015-01-17 83.20 81.10 81.65           112.01  30.71           150
6   AAPL Call 2015-01-17 79.75 80.35 80.75           112.01  31.43           396
> tail(AAPL)
     symbol type     expiry price   bid   ask underlying.price strike open.interest
1437   AAPL  Put 2017-01-20 47.57 46.45 48.20           112.01    150           108
1438   AAPL  Put 2017-01-20 51.00 50.45 52.30           112.01    155            72
1439   AAPL  Put 2017-01-20 52.45 54.55 56.50           112.01    160           203
1440   AAPL  Put 2017-01-20 58.55 58.65 60.75           112.01    165            76
1441   AAPL  Put 2017-01-20 67.90 63.15 64.50           112.01    170           167
1442   AAPL  Put 2017-01-20 68.00 67.35 68.25           112.01    175           239
There is a load of data there. To get an idea of what it looks like we can generate a couple of plots. Below is the Open Interest as a function of Strike Price across all expiration dates. The underlying price is indicated by the vertical dashed line. As one might expect, the majority of interest is associated with the next expiration date on 17 January 2015.
It's pretty clear that this is not the optimal way to look at these data and I would be extremely interested to hear from anybody with a suggestion for a better visualisation. Trying to look at all of the expiration dates together is probably the largest problem, so let's focus our attention on those options which expire on 17 January 2015. Again the underlying price is indicated by a vertical dashed line.
This is the first time that I have seriously had a look at options data, but I will now readily confess to being intrigued. Having the data readily available, there is no reason not to explore further. Details to follow.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
 

