[This article was first published on QuantStrat TradeR » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Several readers, upon seeing the risk and return ratio along with other statistics in the previous post stated that the result may have been the result of data mining/over-optimization/curve-fitting/overfitting, or otherwise bad practice of creating an amazing equity curve whose performance will decay out of sample.

Fortunately, there’s a way to test that assertion. In their book “Trading Systems: A New Approach to System Development and Portfolio Optimization”, Urban Jaekle and Emilio Tomasini use the concept of the “stable region” to demonstrate a way of visualizing whether or not a parameter specification is indeed overfit. The idea of a stable region is that going forward, how robust is a parameter specification to slight changes? If the system just happened to find one good small point in a sea of losers, the strategy is likely to fail going forward. However, if small changes in the parameter specifications still result in profitable configurations, then the chosen parameter set is a valid configuration.

As Frank’s trading strategy only has two parameters (standard deviation computation period, aka runSD for the R function, and the SMA period), rather than make line graphs, I decided to do a brute force grid search just to see other configurations, and plotted the results in the form of heatmaps.

Here’s the modified script for the computations (no parallel syntax in use for the sake of simplicity):

download("https://dl.dropboxusercontent.com/s/jk6der1s5lxtcfy/XIVlong.TXT",
destfile="longXIV.txt")

getSymbols("^VIX", from="2004-03-29")

vixvxmt <- merge(Cl(VIX), Cl(vxmt))
vixvxmt[is.na(vixvxmt[,2]),2] <- vixvxmt[is.na(vixvxmt[,2]),1]

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))

getSymbols("^GSPC", from="1990-01-01")
spyRets <- diff(log(Cl(GSPC)))

t1 <- Sys.time()
MARmatrix <- list()
SharpeMatrix <- list()
for(i in 2:21) {

smaMAR <- list()
smaSharpe <- list()
for(j in 2:21){
spyVol <- runSD(spyRets, n=i)
annSpyVol <- spyVol*100*sqrt(252)
vols <- merge(vixvxmt[,2], annSpyVol, join='inner')

vols$smaDiff <- SMA(vols[,1] - vols[,2], n=j) vols$signal <- vols$smaDiff > 0 vols$signal <- lag(vols$signal, k = 1) stratRets <- vols$signal*xivRets + (1-vols$signal)*vxxRets #charts.PerformanceSummary(stratRets) #stratRets[is.na(stratRets)] <- 0 #plot(log(cumprod(1+stratRets))) stats <- data.frame(cbind(Return.annualized(stratRets)*100, maxDrawdown(stratRets)*100, SharpeRatio.annualized(stratRets))) colnames(stats) <- c("Annualized Return", "Max Drawdown", "Annualized Sharpe") MAR <- as.numeric(stats[1])/as.numeric(stats[2]) smaMAR[[j-1]] <- MAR smaSharpe[[j-1]] <- stats[,3] } rm(vols) smaMAR <- do.call(c, smaMAR) smaSharpe <- do.call(c, smaSharpe) MARmatrix[[i-1]] <- smaMAR SharpeMatrix[[i-1]] <- smaSharpe } t2 <- Sys.time() print(t2-t1)  Essentially, just wrap the previous script in a nested for loop over the two parameters. I chose GGplot2 to plot the heatmaps for more control with coloring. Here’s the heatmap for the MAR ratio (that is, returns over max drawdown): MARmatrix <- do.call(cbind, MARmatrix) rownames(MARmatrix) <- paste0("SMA", c(2:21)) colnames(MARmatrix) <- paste0("runSD", c(2:21)) MARlong <- melt(MARmatrix) colnames(MARlong) <- c("SMA", "runSD", "MAR") MARlong$SMA <- as.numeric(gsub("SMA", "", MARlong$SMA)) MARlong$runSD <- as.numeric(gsub("runSD", "", MARlong$runSD)) MARlong$scaleMAR <- scale(MARlong$MAR) ggplot(MARlong, aes(x=SMA, y=runSD, fill=scaleMAR))+geom_tile()+scale_fill_gradient2(high="skyblue", mid="blue", low="red")  Here’s the result: Immediately, we start to see some answers to questions regarding overfitting. First off, is the parameter set published by TradingTheOdds optimized? Yes. In fact, not only is it optimized, it’s by far and away the best value on the heatmap. However, when discussing overfitting, curve-fitting, and the like, the question to ask isn’t “is this the best parameter set available”, but rather “is the parameter set in a stable region?” The answer, in my opinion to that, is yes, as noted by the differing values of the SMA for the 2-day sample standard deviation. Note that this quantity, due to being the sample standard deviation, is actually the square root of the two squared residuals of that time period. Here are the MAR values for those configurations: > MARmatrix[1:10,1] SMA2 SMA3 SMA4 SMA5 SMA6 SMA7 SMA8 SMA9 SMA10 SMA11 2.471094 2.418934 2.067463 3.027450 2.596087 2.209904 2.466055 1.394324 1.860967 1.650588  In this case, not only is the region stable, but the MAR values are all above 2 until the SMA9 value. Furthermore, note that aside from the stable region of the 2-day sample standard deviation, a stable region using a standard deviation of ten days with less smoothing from the SMA (because there’s already an average inherent in the sample standard deviation) also exists. Let’s examine those values. > MARmatrix[2:5, 9:16] runSD10 runSD11 runSD12 runSD13 runSD14 runSD15 runSD16 runSD17 SMA3 1.997457 2.035746 1.807391 1.713263 1.803983 1.994437 1.695406 1.0685859 SMA4 2.167992 2.034468 1.692622 1.778265 1.828703 1.752648 1.558279 1.1782665 SMA5 1.504217 1.757291 1.742978 1.963649 1.923729 1.662687 1.248936 1.0837615 SMA6 1.695616 1.978413 2.004710 1.891676 1.497672 1.471754 1.194853 0.9326545  Apparently, a standard deviation between 2 and 3 weeks with minimal SMA smoothing also produced some results comparable to the 2-day variant. Off to the northeast of the plot, using longer periods for the parameters simply causes the risk-to-reward performance to drop steeply. This is essentially an illustration of the detriments of lag. Finally, there’s a small rough patch between the two aforementioned stable regions. Here’s the data for that. > MARmatrix[1:5, 4:8] runSD5 runSD6 runSD7 runSD8 runSD9 SMA2 1.928716 1.5825265 1.6624751 1.033216 1.245461 SMA3 1.528882 1.5257165 1.2348663 1.364103 1.510653 SMA4 1.419722 0.9497827 0.8491229 1.227064 1.396193 SMA5 1.023895 1.0630939 1.3632697 1.547222 1.465033 SMA6 1.128575 1.3793244 1.4085513 1.440324 1.964293  As you can see, there are some patches where the MAR is below 1, and many where it’s below 1.5. All of these are pretty detached from the stable regions. Let’s repeat this process with the Sharpe Ratio heatmap. SharpeMatrix <- do.call(cbind, SharpeMatrix) rownames(SharpeMatrix) <- paste0("SMA", c(2:21)) colnames(SharpeMatrix) <- paste0("runSD", c(2:21)) sharpeLong <- melt(SharpeMatrix) colnames(sharpeLong) <- c("SMA", "runSD", "Sharpe") sharpeLong$SMA <- as.numeric(gsub("SMA", "", sharpeLong$SMA)) sharpeLong$runSD <- as.numeric(gsub("runSD", "", sharpeLong\$runSD))
ggplot(sharpeLong, aes(x=SMA, y=runSD, fill=Sharpe))+geom_tile()+
scale_fill_gradient2(high="skyblue", mid="blue", low="darkred", midpoint=1.5)


And the result:

Again, the TradingTheOdds parameter configuration lights up, but among a region of strong configurations. This time, we can see that in comparison to the rest of the heatmap, the northern stable region seems to have become clustered around the 10-day standard deviation (or 11) with SMAs of 2, 3, and 4. The regions to the northeast are also more subdued by comparison, with the Sharpe ratio bottoming out around 1.

Let’s look at the numerical values again for the same regions.

Two-day standard deviation region:

> SharpeMatrix[1:10,1]
SMA2     SMA3     SMA4     SMA5     SMA6     SMA7     SMA8     SMA9    SMA10    SMA11
1.972256 2.210515 2.243040 2.496178 1.975748 1.965730 1.967022 1.510652 1.963970 1.778401


Again, numbers the likes of which I myself haven’t been able to achieve with more conventional strategies, and numbers the likes of which I haven’t really seen anywhere for anything on daily data. So either the strategy is fantastic, or something is terribly wrong outside the scope of the parameter optimization.

Two week standard deviation region:

> SharpeMatrix[1:5, 9:16]
runSD10  runSD11  runSD12  runSD13  runSD14  runSD15  runSD16  runSD17
SMA2 1.902430 1.934403 1.687430 1.725751 1.524354 1.683608 1.719378 1.506361
SMA3 1.749710 1.758602 1.560260 1.580278 1.609211 1.722226 1.535830 1.271252
SMA4 1.915628 1.757037 1.560983 1.585787 1.630961 1.512211 1.433255 1.331697
SMA5 1.684540 1.620641 1.607461 1.752090 1.660533 1.500787 1.359043 1.276761
SMA6 1.735760 1.765137 1.788670 1.687369 1.507831 1.481652 1.318751 1.197707


Again, pretty outstanding numbers.

The rough patch:

> SharpeMatrix[1:5, 4:8]
runSD5   runSD6   runSD7   runSD8   runSD9
SMA2 1.905192 1.650921 1.667556 1.388061 1.454764
SMA3 1.495310 1.399240 1.378993 1.527004 1.661142
SMA4 1.591010 1.109749 1.041914 1.411985 1.538603
SMA5 1.288419 1.277330 1.555817 1.753903 1.685827
SMA6 1.278301 1.390989 1.569666 1.650900 1.777006


All Sharpe ratios higher than 1, though some below 1.5

So, to conclude this post:

Was the replication using optimized parameters? Yes. However, those optimized parameters were found within a stable (and even strong) region. Furthermore, it isn’t as though the strategy exhibits poor risk-to-return metrics beyond those regions, either. Aside from raising the lookback period on both the moving average and the standard deviation to levels that no longer resemble the original replication, performance was solid to stellar.

Does this necessarily mean that there is nothing wrong with the strategy? No. It could be that the performance is an artifact of “observe the close, enter at the close” optimistic execution assumptions. For instance, quantstrat (the go-to backtest engine in R for more trading-oriented statistics) uses a next-bar execution method that defaults on the *next* day’s close (so if you look back over my quantstrat posts, I use prefer=”open” so as to get the open of the next bar, instead of its close). It could also be that VXMT itself is an instrument that isn’t very well known in the public sphere, either, seeing as how Yahoo finance barely has any data on it. Lastly, it could simply be the fact that although the risk to reward ratios seem amazing, many investors/mutual fund managers/etc. probably don’t want to think “I’m down 40-60% from my peak”, even though it’s arguably easier to adjust a strategy with a good reward to risk ratio with excess risk by adding cash (to use a cooking analogy, think about your favorite spice. Good in small quantities.), than it is to go and find leverage for a good reward to risk strategy with very small returns (not to mention incurring all the other risks that come with leverage to begin with, such as a 50% drawdown wiping out an account leveraged two to one).

However, to address the question of overfitting, through a modified technique from Jaekle and Tomasini (2009), these are the results I found.