Last Post For A While, And Two Premium (Cheap) Databases

[This article was first published on QuantStrat TradeR » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This will be my last post on this blog for an indefinite length of time. I will also include an algorithm to query Quandl’s SCF database, which is an update on my attempt to use free futures data from Quandl’s CHRIS database, which suffered from data integrity issues, even after attempts to clean it. Also provided is a small tutorial on using Quandl’s EOD database, for those that take issue with Yahoo’s daily data.

So, first off, the news…as some of you may have already heard from meeting me in person at R/Finance 2015 (which was terrific…interesting presentations, good people, good food, good drinks), effective June 8th, I will be starting employment in the New York area as a quantitative research analyst, and part of the agreement is that this blog becomes an archive of my work, so that I may focus my full energies on my full-time position. That is, it’s not going anywhere, so for those that are recent followers, you now have a great deal of time to catch up on the entire archive, which including this post, will be 62 posts. Topics covered include:

Quantstrat — its basics, and certain strategies coded using it, namely those based off of John Ehlers’s digital signal processing algorithms, along with custom order-sizing functions. A small aside on pairs trading is included as well.
Asset rotation — flexible asset allocation, elastic asset allocation, and most recently, classic asset allocation (aka the constrained critical line algorithm).
Seeking Alpha ideas — both Logical Invest and Harry Long’s work, along with Cliff Smith’s Quarterly Tactical Strategy (QTS). The Logical Invest algorithms do what they set out to do, but in some cases, are dependent on dividends to drive returns. By contrast, Harry Long’s ideas are more thought processes and proofs of concept, as opposed to complete strategies, often depending on ETFs with inception dates after the financial crisis passed (which I used some creativity for to backtest to pre-crisis timelines). I’ve also collaborated with Harry Long privately, and what I’ve seen privately has better risk/reward than anything he has revealed in public, and I find it to be impressive given the low turnover rate of such portfolios.
Volatility trading — XIV and VXX, namely, and a strategy around these two instruments that has done well out of sample.
Other statistical ideas, such as robustness heuristics and change point detection.

Topics I would have liked to have covered but didn’t roll around to:

Most Japanese trading methods — Ichimoku and Heiken Ashi, among other things. Both are in my IKTrading package, I just never rolled around to showing them off. I did cover a hammer trading system which did not perform as I would have liked it to.
Larry Connors’s mean reversion strategies — he has a book on trading ETFs, and another one he wrote before that. The material provided on this blog is sufficient for anyone to use to code those strategies.
The PortfolioAnalytics package — what quantstrat is to signal-based individual instrument trading strategies, PortfolioAnalytics is this (and a lot more) to portfolio management strategies. Although strategies such as equal-weight ranking perform well by some standards, this is only the tip of the iceberg. PortfolioAnalytics is, to my awareness, cutting edge portfolio management technology that can run the gauntlet from quick classic quadratic optimization to cutting-edge random-search global optimization portfolios (albeit those take more time to compute).

Now, onto the second part of this post, which is a pair of premium databases. They’re available from Quandl, and cost $50/month. As far as I’ve been able to tell, the futures database (SCF) data quality is vastly better than the CHRIS database, which can miss (or corrupt) chunks of data. The good news, however, is that free users can actually query these databases (or maybe all databases total, not sure) 150 times in a 60 day period. The futures script sends out 40 of these 150 queries, which may be all that is necessary if one intends to use it for some form of monthly turnover trading strategy.

Here’s the script for the SCF (futures) database. There are two caveats here:

1) The prices are on a per-contract rate. Notional values in futures trading, to my understanding, are vastly larger than one contract, to the point that getting integer quantities is no small assumption.

2) According to Alexios Ghalanos (AND THIS IS REALLY IMPORTANT), R’s GARCH guru, and one of the most prominent quants in the R/Finance community, for some providers, the Open, High, and Low values in futures data may not be based off of U.S. traditional pit trading hours in the same way that OHLC in equities/ETFs are based off of the 9:30 AM – 4:00 PM hours, but rather, extended trading hours. This means that there’s very low liquidity around open in futures, and that the high and low are also based off of these low-liquidity times as well. I am unsure if Quandl’s SCF database uses extended hours open-high-low data (low liquidity), or traditional pit hours (high liquidity), and I am hoping a representative from Quandl will clarify this in the comments section for this post. In any case, I just wanted to make sure that readers are aware of this issue.

In any case, here’s the data fetch for the Stevens Continuous Futures (SCF) database from Quandl. All of these are for front-month contracts, unadjusted prices on open interest cross. Note that in order for this script to work, you must supply quandl with your authorization token, which takes the form of something like this:

Quandl.auth("yourTokenHere")


require(Quandl)

Quandl.auth("yourTokenHere")
authCode <- "yourTokenHere"

quandlSCF <- function(code, authCode, from = NA, to = NA) {
  dataCode <- paste0("SCF/", code)
  out <- Quandl(dataCode, authCode = authCode)
  out <- xts(out[, -1], order.by=out$Date)
  colnames(out)[4] <- "Close"
  colnames(out)[6] <- "PrevDayOpInt"
  if(!is.na(from)) {
    out <- out[paste0(from, "::")]
  }
  if(!is.na(to)) {
    out <- out[paste0("::", to)]
  }
  return(out)
}

#Front open-interest cross
from <- NA
to <- NA

#Energies
CME_CL1 <- quandlSCF("CME_CL1_ON", authCode = authCode, from = from, to = to) #crude
CME_NG1 <- quandlSCF("CME_NG1_ON", authCode = authCode, from = from, to = to) #natgas
CME_HO1 <- quandlSCF("CME_HO1_ON", authCode = authCode, from = from, to = to) #heatOil
CME_RB1 <- quandlSCF("CME_RB1_ON", authCode = authCode, from = from, to = to) #RBob
ICE_B1 <- quandlSCF("ICE_B1_ON", authCode = authCode, from = from, to = to) #Brent
ICE_G1 <- quandlSCF("ICE_G1_ON", authCode = authCode, from = from, to = to) #GasOil

#Grains
CME_C1 <- quandlSCF("CME_C1_ON", authCode = authCode, from = from, to = to) #Chicago Corn
CME_S1 <- quandlSCF("CME_S1_ON", authCode = authCode, from = from, to = to) #Chicago Soybeans
CME_W1 <- quandlSCF("CME_W1_ON", authCode = authCode, from = from, to = to) #Chicago Wheat
CME_SM1 <- quandlSCF("CME_SM1_ON", authCode = authCode, from = from, to = to) #Chicago Soybean Meal
CME_KW1 <- quandlSCF("CME_KW1_ON", authCode = authCode, from = from, to = to) #Kansas City Wheat
CME_BO1 <- quandlSCF("CME_BO1_ON", authCode = authCode, from = from, to = to) #Chicago Soybean Oil

#Softs
ICE_SB1 <- quandlSCF("ICE_SB1_ON", authCode = authCode, from = from, to = to) #Sugar No. 11
ICE_KC1 <- quandlSCF("ICE_KC1_ON", authCode = authCode, from = from, to = to) #Coffee
ICE_CC1 <- quandlSCF("ICE_CC1_ON", authCode = authCode, from = from, to = to) #Cocoa
ICE_CT1 <- quandlSCF("ICE_CT1_ON", authCode = authCode, from = from, to = to) #Cotton

#Other Ags
CME_LC1 <- quandlSCF("CME_LC1_ON", authCode = authCode, from = from, to = to) #Live Cattle
CME_LN1 <- quandlSCF("CME_LN1_ON", authCode = authCode, from = from, to = to) #Lean Hogs

#Precious Metals
CME_GC1 <- quandlSCF("CME_GC1_ON", authCode = authCode, from = from, to = to) #Gold
CME_SI1 <- quandlSCF("CME_SI1_ON", authCode = authCode, from = from, to = to) #Silver
CME_PL1 <- quandlSCF("CME_PL1_ON", authCode = authCode, from = from, to = to) #Platinum
CME_PA1 <- quandlSCF("CME_PA1_ON", authCode = authCode, from = from, to = to) #Palladium

#Base
CME_HG1 <- quandlSCF("CME_HG1_ON", authCode = authCode, from = from, to = to) #Copper

#Currencies
CME_AD1 <- quandlSCF("CME_AD1_ON", authCode = authCode, from = from, to = to) #Ozzie
CME_CD1 <- quandlSCF("CME_CD1_ON", authCode = authCode, from = from, to = to) #Canadian Dollar
CME_SF1 <- quandlSCF("CME_SF1_ON", authCode = authCode, from = from, to = to) #Swiss Franc
CME_EC1 <- quandlSCF("CME_EC1_ON", authCode = authCode, from = from, to = to) #Euro
CME_BP1 <- quandlSCF("CME_BP1_ON", authCode = authCode, from = from, to = to) #Pound
CME_JY1 <- quandlSCF("CME_JY1_ON", authCode = authCode, from = from, to = to) #Yen
ICE_DX1 <- quandlSCF("ICE_DX1_ON", authCode = authCode, from = from, to = to) #Dollar Index

#Equities
CME_ES1 <- quandlSCF("CME_ES1_ON", authCode = authCode, from = from, to = to) #Emini
CME_MD1 <- quandlSCF("CME_MD1_ON", authCode = authCode, from = from, to = to) #Midcap 400
CME_NQ1 <- quandlSCF("CME_NQ1_ON", authCode = authCode, from = from, to = to) #Nasdaq 100
ICE_RF1 <- quandlSCF("ICE_RF1_ON", authCode = authCode, from = from, to = to) #Russell Smallcap
CME_NK1 <- quandlSCF("CME_NK1_ON", authCode = authCode, from = from, to = to) #Nikkei

#Bonds/rates
CME_FF1  <- quandlSCF("CME_FF1_ON", authCode = authCode, from = from, to = to) #30-day fed funds
CME_ED1 <- quandlSCF("CME_ED1_ON", authCode = authCode, from = from, to = to) #3 Mo. Eurodollar/TED Spread
CME_FV1  <- quandlSCF("CME_FV1_ON", authCode = authCode, from = from, to = to) #Five Year TNote
CME_TY1  <- quandlSCF("CME_TY1_ON", authCode = authCode, from = from, to = to) #Ten Year Note
CME_US1  <- quandlSCF("CME_US1_ON", authCode = authCode, from = from, to = to) #30 year bond

In this case, I just can’t give away my token. You’ll have to replace that with your own, which every account has. However, once again, individuals not subscribed to these databases need to pay $50/month.

Lastly, I’d like to show the Quandl EOD database. This is identical in functionality to Yahoo’s, but may be (hopefully!) more accurate. I have never used this database on this blog because the number one rule has always been that readers must be able to replicate all analysis for free, but for those who doubt the quality of Yahoo’s data, they may wish to look at Quandl’s EOD database.

This is how it works, with an example for SPY.

out <- Quandl("EOD/SPY", start_date="1999-12-31", end_date="2005-12-31", type = "xts")

And here’s some output.

> head(out)
             Open   High    Low  Close   Volume Dividend Split Adj_Open Adj_High  Adj_Low Adj_Close Adj_Volume
1999-12-31 146.80 147.50 146.30 146.90  3172700        0     1 110.8666 111.3952 110.4890  110.9421    3172700
2000-01-03 148.25 148.25 143.88 145.44  8164300        0     1 111.9701 111.9701 108.6695  109.8477    8164300
2000-01-04 143.50 144.10 139.60 139.80  8089800        0     1 108.3828 108.8359 105.4372  105.5882    8089800
2000-01-05 139.90 141.20 137.30 140.80 12177900        0     1 105.6631 106.6449 103.6993  106.3428   12177900
2000-01-06 139.60 141.50 137.80 137.80  6227200        0     1 105.4327 106.8677 104.0733  104.0733    6227200
2000-01-07 140.30 145.80 140.10 145.80  8066500        0     1 105.9690 110.1231 105.8179  110.1231    8066500

To note, this data is not automatically assigned to “SPY” as quantmod’s “getSymbols” function fetching from Yahoo would automatically do. Also, note that when calling the Quandl function to its EOD database, you automatically obtain both adjusted and unadjusted prices. One aspect that I am not sure is as easily done through Quandl’s API is how easy it is to adjust prices for splits but not dividends. But, for what it’s worth, there it is. So for those that take contention with the quality of Yahoo data, you may wish to look at Quandl’s EOD database for $50/month.

So…that’s it. From this point on, this blog is an archive of my work that will stay up; it’s not going anywhere. However, I won’t be updating it or answering questions on this blog. For those that have any questions about functionality, I highly recommend posting questions to the R-SIG Finance mailing list. It’s been a pleasure sharing my thoughts and work with you, and I’m glad I’ve garnered the attention of many intelligent individuals, from those that have provided me with data, to those that have built upon my work, to those that have hired me for consulting (and now a full-time) opportunity. I also hope that some of my work displayed here made it to other trading and/or asset management firms. I am very grateful for all of the feedback, comments, data, and opportunities I’ve received along the way.

Once again, thank you so much for reading. It’s been a pleasure.


To leave a comment for the author, please follow the link and comment on their blog: QuantStrat TradeR » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)