# Programming a custom Backtest Profile in R

January 6, 2010
By

One of the many issues with systems trading is trying to make sense of the vast amounts of data you accumulate with the backtest of a system. Historical backtesting is the first step in testing your trading idea. If it is a trading idea that ought to work across many different markets, then you need to test it on many different markets to see how it performs. Yes, you are looking to see how .... robust (I said it) ... your trading idea is in the crucible of historical data. It's easy to get lost in the data and that's why I'm embarking on creating a custom Backtest Profile Report, dubbed version 1.0. I've chosen to create this profile using the R statistical package, which is offered for free to those who elect to use it.

I'm just getting started and I can see how this code will reach thousands of line already. The good news is that once the basic logic is set up, it's basically a trivial population of code to reach the final product. Let's start with the beginning. This is the code I currently have saved for the beginnings of the Backtest Profile Report.

################### Call packages

require ("zoo")
require ("xts")

At this point, the file I created and massaged a little is being read into the R software. The next command will print the first 6 records of the file, and will expose what the header looks like.

1     SELL      Short            US      US_REV.CSV
3     SELL      Short            LX      LX_REV.CSV
4     SELL      Short            TY      TY_REV.CSV
6     SELL      Short            ZB      ZB_REV.CSV

EntryDate     EntryPrice    Exit.Date  Exit.Year   Exit.Name
3/19/1990   21.71875    3/28/1990      1990     Cover
3/19/1990   130.30000   3/29/1990      1990      Sell
3/19/1990  4327.00000  3/29/1990      1990     Cover
3/20/1990      29.60938  3/29/1990      1990     Cover
3/20/1990     78.22000  5/15/1990      1990      Sell
3/21/1990     21.03000   4/3/1990       1990     Cover

1   22.40625      -687.50         -687.50
2  129.85000     -225.00         -912.50
3 4354.00000    -270.00       -1182.50
4   30.20312      -593.75       -1776.25
5   78.79000        570.00       -1206.25
6   24.63000     -1512.00      -2718.25

Now you can view what R sees when it views the .csv file that was read-in with the read.csv method.

R looks at the dates in a specific way, so we need convert our date format to one that R likes. This is done with simple code calling the as.Date method.

############### Convert date character to R Date

ENTER <--    as.Date(BT$EntryDate,"%m/%d/%Y") EXIT <-- as.Date(BT$Exit.Date,"%m/%d/%Y")

A quick note. To get the EntryDate column (or vector as R likes to call it), we first identify the file (or data.frame as R likes to call it) and then use the '$' symbol to identify the vector. So, BT$ExitDate is the BT file, ExitDate column. We'll use these dates later, but best practices requires us to get it fixed early on.

Next, we're going to clean up some of these bizarre headers by redefining them.

################## Define Variables from existing vectors

Market  <-- BT$Market PnL <-- BT$Trade.P.L
Year      <-- BT\$Exit.Year

So instead of always referring to a column (or vector) with some unintuitive nomenclature, we're just going to assign a simple name to some important ones.

Now we're ready to start drilling down on some data we'll use to get important data to the fore.

##################### Define Variables as new vectors

################ Format = Statistic.Market.Year

############ the PUZZLE - subset a subset
##################### PUZZLE solved with & symbol to add conditions

PnL.AN <-- subset(PnL, Market=="AN")
PnL.AN.1990 <-- subset(PnL, Market=="AN" & Year=="1990")
PnL.AN.1991 <-- subset(PnL, Market=="AN" & Year=="1991")
PnL.AN.1992 <-- subset(PnL, Market=="AN" & Year=="1992")
PnL.AN.1993 <-- subset(PnL, Market=="AN" & Year=="1993")
PnL.AN.1994 <-- subset(PnL, Market=="AN" & Year=="1994")
PnL.AN.1995 <-- subset(PnL, Market=="AN" & Year=="1995")
PnL.AN.1996 <-- subset(PnL, Market=="AN" & Year=="1996")
PnL.AN.1997 <-- subset(PnL, Market=="AN" & Year=="1997")
PnL.AN.1998 <-- subset(PnL, Market=="AN" & Year=="1998")
PnL.AN.1999 <-- subset(PnL, Market=="AN" & Year=="1999")

I've included the same comments I put into my code because sometimes I forget how I got to where I am, and it helps to include comments. (R ignores stuff after the # sign). I had some trouble figuring out how to subset a vector to include only values I'm interested in (such as a specific market), but got it figured out. There is more than one way to do this, but this works for now. There are also some issues with trying to subset an already subsetted object, so it's best to use the '&' symbol to specifically define a subset right from the get go.

Alright, let's take a quick break and see what sort of object we have created. Take PnL.AN.1999 for example. It looks at the PnL vector (the original big Kahuna), takes out only those who include the AN market and the Year value of 1999. Essentially, it's a little nugget that shows how trades fared in 1999 in the Australian Dollar.

So far so good. Now let's take only those trades that were profitable and then we'll take a break. Thanks for hanging in this long.

############ the following breakdown does not account for zero trades
############ the order is critical
############ create a subset of a subset and THEN take positive values
############ the other way doesn't work for some reason

WinPnL <- PnL [PnL>0]

WinPnL.AN <- PnL.AN [PnL.AN>0]
WinPnL.AN.1990 <- PnL.AN.1990 [PnL.AN.1990>0]
WinPnL.AN.1991 <- PnL.AN.1991 [PnL.AN.1991>0]
WinPnL.AN.1992 <- PnL.AN.1992 [PnL.AN.1992>0]
WinPnL.AN.1993 <- PnL.AN.1993 [PnL.AN.1993>0]
WinPnL.AN.1994 <- PnL.AN.1994 [PnL.AN.1994>0]
WinPnL.AN.1995 <- PnL.AN.1995 [PnL.AN.1995>0]
WinPnL.AN.1996 <- PnL.AN.1996 [PnL.AN.1996>0]
WinPnL.AN.1997 <- PnL.AN.1997 [PnL.AN.1997>0]
WinPnL.AN.1998 <- PnL.AN.1998 [PnL.AN.1998>0]
WinPnL.AN.1999 <- PnL.AN.1999 [PnL.AN.1999>0]

Here I have defined another object, specifically WinPnL which will hold only those trades that showed a profit. It's one level deeper than the PnL code just above it. It takes the PnL vector and extracts only positive values by using the WinPnL<- PnL[PnL>0] method. Fairly simple code. Now to test it we type out an object and R should return its value. Thusly,

WinPnL.AN.1990
[1] 2740 1500

There it works. I've checked it against the original .csv file and there were indeed two profitable trades in 1990 with the values listed. R lets you test stuff very quickly and efficiently by simply running the script. It's best to write the script in an editor and run it from there instead of typing it directly into the terminal.

There is still quite a bit more statistics to code for our final product. As you can see, it's easy to define variables and manipulate them in R, so it won't be too hard to get to the statistics we want to see. For version 1.0, I'm focusing on overall profitability by market, overall positive expectancy by market, average winning percentage, percentage of yearly profitability by market, and percentage of yearly positive expectancy by market. I'm using 47 markets so I've got some copy and pasting in my future, along with some editing. Because of the potentially explosive number of lines to deal with, I'll have to figure out a good way to do this. I'm thinking I need to learn VIM so I can do it efficiently, because I'm sure not going to move around with the stupid arrow keys, and delete and type in all those markets.

The end-game is to create intuitive histograms that give us a good feel for the system we're testing. Don't complain about how hard it is, etc. I'm no programmer and I'm getting it to work, so you can too. Now get coding.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...