[This article was first published on R – The R Trader, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When testing trading strategies a common approach is to divide the initial data set into in sample data: the part of the data designed to calibrate the model and out of sample data: the part of the data used to validate the calibration and ensure that the performance created in sample will be reflected in the real world. As a rule of thumb around 70% of the initial data can be used for calibration (i.e. in sample) and 30% for validation (i.e. out of sample). Then a comparison of the in and out of sample data help to decide whether the model is robust enough. This post aims at going a step further and provides a statistical method to decide whether the out of sample data is in line with what was created in sample.

In the chart below the blue area represents the out of sample performance for one of my strategies.

A simple visual inspection reveals a good fit between the in and out of sample performance but what degree of confidence do I have in this? At this stage not much and this is the issue. What is truly needed is a measure of similarity between the in and out of sample data sets. In statistical terms this could be translated as the likelihood that the in and out of sample performance figures coming from the same distribution. There is a non-parametric statistical test that does exactly this: the Kruskall-Wallis Test. A good definition of this test could be found on R-Tutor “A collection of data samples are independent if they come from unrelated populations and the samples do not affect each other. Using the Kruskal-Wallis Test, we can decide whether the population distributions are identical without assuming them to follow the normal distribution.” The added benefit of this test is not assuming a normal distribution.

It exists other tests of the same nature that could fit into that framework. The Mann-Whitney-Wilcoxon test or the Kolmogorov-Smirnov tests would perfectly suits the framework describes here however this is beyond the scope of this article to discuss the pros and cons of each of these tests. A good description along with R examples can be found here.

Here’s the code used to generate the chart above and the analysis:

################################################
## Making the most of the OOS data
##
## [email protected] - Aug. 2016
################################################
library(xts)
library(PerformanceAnalytics)

thePath <- "myPath" #change this
theFile <- "data.csv"
data <- read.csv(paste0(thePath,theFile),header=TRUE,sep=",")
data <- xts(data[,2],order.by=as.Date(as.character(data[,1]),format = "%d/%m/%Y"))

##----- Strategy's Chart
par(mex=0.8,cex=1)
thePeriod <- c("2012-02/2016-05")
chart.TimeSeries(cumsum(data),
main = "System 1",
ylab="",
period.areas = thePeriod,
grid.color = "lightgray",
period.color = "slategray1")

##----- Kruskal tests
pValue <- NULL
i <- 1
while (i < 1000){
isSample <- sample(isData,length(osData))
pValue <- rbind(pValue,kruskal.test(list(osData, isSample))\$p.value)
i <- i + 1
}

##----- Mean of p-values
mean(pValue)


In the example above the in sample period is longer than the out of sample period therefore I randomly created 1000 subsets of the in sample data each of them having the same length as the out of sample data. Then I tested each in sample subset against the out of sample data and I recorded the p-values. This process creates not a single p-value for the Kruskall-Wallis test but a distribution making the analysis more robust. In this example the mean of the p-values is well above zero (0.478) indicating that the null hypothesis should be accepted: there are strong evidences that the in and out of sample data is coming from the same distribution.

As usual what is presented in this post is a toy example that only scratches the surface of the problem and should be tailored to individual needs. However I think it proposes an interesting and rational statistical framework to evaluate out of sample results.

To leave a comment for the author, please follow the link and comment on their blog: R – The R Trader.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)