Often I have some idea for a trading system that is of the form “does some particular aspect of the last n periods of data have any predictive use for subsequent periods.”
I generally like to work with nice units of time, such as 4 weeks or 6 months, rather than 30 or 126 days. It probably doesn’t make a meaningful difference in most cases, but it’s nice to have the option.
At first this seemed like something rollapply() from the zoo package could help with, but there are a number of preconditions that need to be met and frankly I find them to be a bit of a pain.
In a nutshell I have not been able to find a nice way for it to apply a function to a rolling subset of time series data nicely aligned to weekly or monthly boundaries.
All is not lost, there is a neat function in xts called endpoints(), which takes a time series and a period (e.g. “weeks”, “months”) and returns the indexes into that time series for the corresponding periods.
Using this information it becomes easy to subset the time set data using normal row subset operations.
The xts package also has period.apply but it runs on non-overlapping intervals, which is close but still not quite what I want.
In the script for this post there are 4 or so functions of note.
The main one is roll_model, which takes a set of data to be subsetted and passed to the model, the size of per model training and test sets and the period size to split things up, which is anything valid for use with endpoints().
A utility function is train_test_split which also uses endpoints() to split a subset of data into 2 sets, one for training the model, one for testing. In practice it needs to be the same period type as you expect to use with roll_model.
The function that actually builds the model and returns some results is run_model(), which calls train_test_split to get the training and test set, builds a model using ksvm in this example, and sees how it goes based on the test set.
Another utility function is called before that, data_prep which builds the main data object to be passed to roll_model. In this example it takes a set of log closes to close returns, sets Y to be the return at time t, X1 the log return at t-1, X2 at t-2 and so on.
The example model is not a particularly useful way of looking at things, which is not surprising given close to close returns are effectively random noise. But perhaps the script itself is useful for other ideas, and if anyone knows better/easier/alternate ways of doing the same thing I would love to hear about them.
The script is available here.