When is a Backtest Too Good to be True?

[This article was first published on Quintuitive » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One statistic which I find useful to form a first impression of a backtest is the success/winning percentage. Since it can mean different things, let’s be more precise: for a strategy over daily data, the winning percentage is the percentage of the days on which the strategy had positive returns (in other words, the strategy guessed the sign of the return correctly on these days). Now the question – if I see 60% winning percentage for a S&P 500 strategy, does/should my bullshit-alarm go off?

That actually happened not too long ago while reading a paper. One of the best strategy in the paper was reporting 60% winning percentage out-of-sample. My gut feeling was that it’s a unrealistic, cherry-picked sample. I was pretty sure I am correct, but it raises other interesting questions, so I decided to investigate a bit more.

Let’s perform an experiment. Let’s take 20 years of daily returns on the S&P 500 and draw a number of samples. All samples have the same size – 60% of the data. For each sample, we assume that we guessed the sign on these days correctly, and we were wrong on the rest. One way to implement this in R is:

return.mc = function(rets, samples=1000, size=252) {
   # The annualized return for each sample
   result = rep(NA, samples)
   for(ii in 1:samples) {
      # Sample the indexes
      aa = sample(1:NROW(rets), size=size)
      # All days we guessed wrong
      bb = -abs(rets)
      # On the days in the sample we guessed correctly
      bb[aa] = abs(bb[aa])
      # Compute the annualized return for this sample.
      # Note, we convert bb to a vector, otherwise, the computation takes forever.
      result[ii] = Return.annualized(as.numeric(bb),scale=252)

Now, let’s see what we get for a couple success rates:

gspc = getSymbols("^GSPC", from="1900-01-01", auto.assign=F)
rets = ROC(Cl(gspc),type="discrete",na.pad=F)["1994/2013"]

aa = return.mc(rets, size=as.integer(0.6*NROW(rets)))
##   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.3353  0.4518  0.4849  0.4860  0.5183  0.6666

aa = return.mc(rets, size=as.integer(0.55*NROW(rets)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.08744 0.17870 0.20410 0.20410 0.23130 0.33670

For the 60% success rate, we have an average (over all samples) of 48% annually! It seems it was completely justified to make me very suspicious about serious cherry-picking. It’s either that, or the authors have discovered the ultimate trading strategy. Needless to say it turned out to be the former …

The second result, the 55% success is about where I draw the line of something plausible, although I’d be surprised if something of that quality is shared in public.

One last observation, which is important mostly for long-only strategies. We have 53.81% positive returns overall, yet, at the 50% success rate we still get negative mean return! That’s a manifestation of the fact that the positive and negative market profiles are totally different. In any case, what it means is that for long-only strategies I might push the plausible percentage up a bit.

The post When is a Backtest Too Good to be True? appeared first on Quintuitive.

To leave a comment for the author, please follow the link and comment on their blog: Quintuitive » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)