[This article was first published on R – QuantStrat TradeR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So, first off: I just finished a Thinkful data science in python bootcamp program that was supposed to take six months, in about four months. All of my capstone projects I applied to volatility trading; long story short, none of the ML techniques worked, and the more complex the technique I tried, the worse it performed. Is there a place for data science in Python in the world? Of course. Some firms swear by it. However, R currently has many more libraries developed specifically for quantitative finance, such as PerformanceAnalytics, quantstrat, PortfolioAnalytics, and so on. Even for more basic portfolio management tasks, I use functions such as Return.Portfolio and charts.PerformanceSummary in R, the equivalent for which I have not seen in Python. While there are some websites with their own dialects built on top of Python, such as quantConnect and quantopian, I think that’s more their own special brand of syntax, as opposed to being able to create freeform portfolio backtesting strategies from pandas.

In any case, here’s my Python portfolio from the bootcamp I completed. The fact that Yahoo’s data ingestion broke on the SHORTVOL index means that the supervised and unsupervised notebooks need their data input replaced by the one in the final capstone project. You can look at the notebooks to see exactly what I tried, but to cut to the chase, none of the techniques worked. Random forests, SVMs, XG boosting, UMAP…they don’t really apply to predicting returns. The features I used were those I use in my own trading strategy, at least some of them, so it wasn’t a case of “garbage in, garbage out”. And the more advanced the technique, the worse the results. In the words of one senior quant trading partner: “Auto-ML = auto-bankrupt”. So when people say “we use AI and machine learning to generate superior returns”, they’ve either found something absolutely spectacular (highly unlikely), or are just using the latest hype terms. After all, even linear regression can be thought of as a learning model.

Even taking PCAs of various term structure features did a worse job than my base volatility trading strategy. Of course, it’s gotten better since then as I added more risk management to the strategy, and caught a nice chunk of the coronavirus long vol move in March. You can subscribe to it here.

So yes, I code in Python now (if the previous post wasn’t any indication, so those who need some Python development for quant work, if it uses the usual numpy/scipy/pandas stack, feel free to reach out to me).

Anyway, this post is about adding some Corey Hoffstein style analysis to asset allocation strategies, this time in R, because this is a technique I used for a very recent freelance project for an asset allocation firm that I currently freelance for (off and on). I call it Corey Hoffstein style, because on twitter, he’s always talking about analyzing the impact of timing luck. His blog at Newfound Research is terrific for thinking about elements one doesn’t see in many other places, such as analyzing trend-following strategies in the context of option payoffs, the impact of timing luck and various parameters of lookback windows, and so on.

The quick idea is this: when you rebalance a portfolio every month, you want to know how changing the various trading day affects your results. This is what Walter does over at AllocateSmartly.

But a more interesting question is what happens when a portfolio is rebalanced on longer timeframes–that is, what happens when you rebalance a portfolio only once a quarter, once every six months, or once a year? What if instead of rebalancing quarterly on January, April, and so on, you rebalance instead on February, May, etc.?

This is a piece of code (in R, so far) that does exactly this:

offset_monthly_endpoints <- function(returns, k, offset) {

# because the first endpoint is indexed to 0 and is the first index, add 1 to offset
mod_offset = (offset+1)%%k # make sure we don't have 7 month offset on 6 month rebalance--that's just 1.
eps <- endpoints(returns, on = 'months') # get monthly endpoints
indices <- (1:length(eps)) # create indices from 1 to number of endpoints
selected_eps <- eps[indices%%k == mod_offset] # only select endpoints that have proper offset when modded by k
selected_eps <- unique(c(0, selected_eps, nrow(returns))) # append start and end of data
return(selected_eps)
}



Essentially, the idea behind this function is fairly straightforward: given that we want to subset on monthly endpoints at some interval (that is, k = 3 for quarterly, k = 6 for every 6 months, k = 12 for annual endpoints), we want to be able to offset those by some modulo, we use a modulo operator to say “hey, if you want to offset by 4 but rebalance every 3 months, that’s just the same thing as offsetting by 1 month”. One other thing to note is that since R is a language that starts at index 1 (rather than 0), there’s a 1 added to the offset, so that offsetting by 0 will get the first monthly endpoint. Beyond that, it’s simply creating an index going from 1 to the length of the endpoints (that is, if you have around 10 years of data, you have ~120 monthly endpoints), then simply seeing which endpoints fit the criteria of being every first, second, or third month in three.

So here’s how it works, with some sample data:

require(quantmod)
require(PerformanceAnalytics)

getSymbols('SPY', from = '1990-01-01')
SPY.Open SPY.High  SPY.Low SPY.Close SPY.Volume SPY.Adjusted
1993-01-29 43.96875 43.96875 43.75000  43.93750    1003200     26.29929
1993-04-30 44.12500 44.28125 44.03125  44.03125      88500     26.47986
1993-07-30 45.09375 45.09375 44.78125  44.84375      75300     27.15962
1993-10-29 46.81250 46.87500 46.78125  46.84375      80700     28.54770
1994-01-31 48.06250 48.31250 48.00000  48.21875     313800     29.58682
1994-04-29 44.87500 45.15625 44.81250  45.09375     481900     27.82893
SPY.Open SPY.High  SPY.Low SPY.Close SPY.Volume SPY.Adjusted
1993-02-26 44.43750 44.43750 44.18750  44.40625      66200     26.57987
1993-05-28 45.40625 45.40625 45.00000  45.21875      79100     27.19401
1993-08-31 46.40625 46.56250 46.34375  46.56250      66500     28.20059
1993-11-30 46.28125 46.56250 46.25000  46.34375     230000     28.24299
1994-02-28 46.93750 47.06250 46.81250  46.81250     333000     28.72394
1994-05-31 45.73438 45.90625 45.65625  45.81250     160000     28.27249
SPY.Open SPY.High  SPY.Low SPY.Close SPY.Volume SPY.Adjusted
1993-03-31 45.34375 45.46875 45.18750  45.18750     111600     27.17521
1993-06-30 45.12500 45.21875 45.00000  45.06250     437600     27.29210
1993-09-30 46.03125 46.12500 45.84375  45.93750      99300     27.99539
1993-12-31 46.93750 47.00000 46.56250  46.59375     312900     28.58971
1994-03-31 44.46875 44.68750 43.53125  44.59375     788800     27.52037
1994-06-30 44.82812 44.84375 44.31250  44.46875     271900     27.62466


Notice how we get different quarterly rebalancing end dates. This also works with semi-annual, annual, and so on. The one caveat to this method, however, is that when doing tactical asset allocation analysis in R, I subset by endpoints. And since I usually use monthly endpoints in intervals of one (that is, every monthly endpoint), it’s fairly simple for me to incorporate measures of momentum over any monthly lookback period. That is, 1 month, 3 month, etc. are all fairly simple when rebalancing every month. However, for instance, if one were to rebalance every quarter, and take only quarterly endpoints, then getting a one-month momentum measure every quarter would take a bit more work, and if one wanted to do quarterly rebalancing, tranche it every month, but also not simply rebalance at the end of the month, but rebalanace multiple times *throughout* the month, that would require even more meticulousness.

However, one sort of second, “kludge-y” method to go about this, would be to run the backtest to find all the weights, and then apply a similar coding methodology to the *weights*. For instance, if you have a time series of monthly weights, just create an index ranging from 1 to the length of the weights, then depending on how often you want to rebalance, subset for every mod 3 == 0, 1, or 2. More generally, if you rebalance once every k months, you create an index ranging from 1 to the length of your index if the language is base 1 (R), or 0 to length n-1, if Python. Then, you simply see which indices give a remainder of 0 to k-1 when taking the modulo K, and that’s it. This will allow you to get k different rebalancing tranches by taking the indices of those endpoints. And you can still offset those endpoints daily as well. The caveat here, of course, is that you need to run the backtest for all of the individual months, and if you have a complex optimization routine, this can take an unnecessarily long time. So which method you use depends on the task at hand. This second method, however, is what I would use as a wrapper to a monthly rebalancing algorithm that already exists, such as my KDA asset allocation algorithm.

That’s it for this post. In terms of things I want to build going forward: I’d like to port over some basic R functionality to Python, such as Return.Portfolio, and charts.PerformanceSummary, and once I can get that working, to demonstrate how to do a lot of the same asset allocation work I’ve done in R…in Python, as well.