March 1, 2012
By

This is the first post of a series that describes how to download and parse specific data sets into R. These kinds of scripts can be functionalized further, but I doubt that these will ever find their way into a formal package. They are intended to be helpful to those facing similar tasks, but as demonstration scripts they will not be supported.

This is a script for downloading and parsing a monthly total return series of the FTSE NAREIT U.S. Real Estate Index. The spreadsheet can be downloaded from their website manually, but as you will see later, we don’t have to. Download the spreadsheet if you want to look at how it is structured.

This script can be found in the /inst/parser directory of FinancialInstrument on R-Forge.

After I parse the data, I want to store it and make it available for other R sessions to use. To do that, I’m going to use FinancialInstrument to register the location of the data so that I can retrieve it with getSymbols, much like I might do to retrieve data from Yahoo.

Let’s take a look at the script…

First, we load the required packages:

require(xts)
require(gdata)
require(FinancialInstrument)
require(quantmod)


Warning: this requires the quantmod version to be a development build after revision 560 or so.
Next, I’ll set the storage directory, where I’ll store the data and access it through getSymbols.
filesroot = "~/Data/NAREIT"


Create and set the storage directory if it doesn’t exist.
if (!file.exists(filesroot))
dir.create(filesroot, mode="0777")
if (!file.exists(paste(filesroot, "/NAREIT.IDX", sep="")))
dir.create(paste(filesroot, "/NAREIT.IDX", sep=""), mode="0777")


The read.xls function from the package gdata will download the xls workbook directly from the web site.
x = read.xls("http://returns.reit.com/returns/MonthlyHistoricalReturns.xls", pattern="Date", sheet="Data")


Let’s take a look at what we retrieved:
> head(x)
Date Return  Index Return.1 Index.1 Return.2 Yield  X Return.3 Index.2 Return.4
1 Dec-71     NA 100.00       NA  100.00       NA    NA NA       NA  100.00       NA
2 Jan-72   1.22 101.22     0.33  100.33     0.89  6.51 NA     1.22  101.22     0.33
3 Feb-72   0.95 102.18     0.92  101.25     0.03  6.39 NA     0.95  102.18     0.92
4 Mar-72   0.25 102.44    -0.44  100.81     0.69  6.32 NA     0.25  102.44    -0.44
5 Apr-72   0.25 102.70    -0.39  100.41     0.65  6.52 NA     0.25  102.70    -0.39
6 May-72  -1.51 101.15    -1.78   98.63     0.27  7.10 NA    -1.51  101.15    -1.78
Index.3 Return.5 Yield.1 X.1 Return.6 Index.4 Return.7 Index.5 Return.8 Yield.2 X.2
1  100.00       NA      NA  NA       NA      NA       NA      NA       NA      NA  NA
2  100.33     0.89    6.51  NA       NA      NA       NA      NA       NA      NA  NA
3  101.25     0.03    6.39  NA       NA      NA       NA      NA       NA      NA  NA
4  100.81     0.69    6.32  NA       NA      NA       NA      NA       NA      NA  NA
5  100.41     0.65    6.52  NA       NA      NA       NA      NA       NA      NA  NA
6   98.63     0.27    7.10  NA       NA      NA       NA      NA       NA      NA  NA
Return.9 Index.6 Return.10 Index.7 Return.11 Yield.3 X.3 Return.12 Index.8 Return.13
1       NA  100.00        NA  100.00        NA      NA  NA        NA  100.00        NA
2     0.00  100.00     -1.35   98.65      1.35    6.13  NA      0.00  100.00     -1.35
3     1.74  101.74      1.74  100.37      0.00    5.74  NA      1.74  101.74      1.74
4    -0.32  101.42     -0.74   99.63      0.42    5.92  NA     -0.32  101.42     -0.74
5     3.84  105.31      3.24  102.85      0.60    5.72  NA      3.84  105.31      3.24
6    -9.37   95.44     -9.75   92.83      0.38    6.53  NA     -9.37   95.44     -9.75
Index.9 Return.14 Yield.4 X.4 Return.15 Index.10 Return.16 Index.11 Return.17 Yield.5
1  100.00        NA      NA  NA        NA   100.00        NA   100.00        NA      NA
2   98.65      1.35    6.13  NA      1.24   101.24      0.87   100.87      0.38    6.10
3  100.37      0.00    5.74  NA      0.52   101.76      0.52   101.39      0.00    6.07
4   99.63      0.42    5.92  NA     -1.59   100.15     -2.74    98.61      1.16    6.02
5  102.85      0.60    5.72  NA      0.37   100.52     -0.10    98.51      0.47    6.45
6   92.83      0.38    6.53  NA      1.36   101.88      1.22    99.71      0.14    7.49
X.5
1  NA
2  NA
3  NA
4  NA
5  NA
6  NA


I’ll focus on the first three columns: the date, the total return, and the total return index. The next columns divide it into price, income and dividend yield components, and after that are the individual sectors. If you are interested, it would be trivial to loop through those additional columns and create symbols for the rest of the columns in the spreadsheet. But for now, I just want to parse the total return index.

First, I’m going to parse the dates. In the spreadsheet, these are formatted as month and year (strptime format is “%b-%y”), so they are not associated with a specific day. I want to transform these to reflect the last day of the month so that they are easier to align with other data I might use in a subsequent analysis.

To do that, I first use as.yearmon to parse the dates from “%b-%y”. Then I can add the last day of the month by transforming that using as.Date with frac=1. It’s worth knowing that a value of zero would select the first day of the month – that comes in handy every once in a while.

x.dates = as.Date(as.yearmon(x[,1], format="%b-%y"), frac=1)


I then transform the returns from whole numbers to percentages.
x.returns = xts(x[,2]/100, order.by = x.dates)


I want to capture the price index as well, but I need to get rid of the commas in the string so that they are recognized as numbers.
x.price = as.numeric((sub(",","", x[,3], fixed=TRUE)))
x.price = xts(x.price, order.by = x.dates)


Now I combine the data into a single object…
x.xts = cbind(x.price, x.returns)
colnames(x.xts) = c("Close", "Returns")


… and save it into an rda file on the filesystem.
save(x.xts, file=paste(filesroot,"NAREIT.IDX/NAREIT.IDX.rda", sep="/"))


Next, I want to add some metadata to the data, so that I remember its attributes, such as the currency it is denominated in (actually, that’s the only required one).

Create the currency the index is demoninated in, if it hasn’t been done already (FinancialInstrument does create some major currencies when loaded, but it doesn’t hurt to do it again).

currency("USD")


Now I can describe the metadata for the index, using instrument.
instrument("NAREIT.IDX", currency="USD", multiplier=1, tick_size=.01, start_date="1971-12-31", description="FTSE NAREIT U.S. Real Estate Index", data="CR", source="reit.com", assign_i=TRUE)


Now, whenever you start a new R session you need to register the instruments. This might be a line you put into .Rprofile ( along with the packages and the filesroot definition) so that it happens automatically:
setSymbolLookup.FI(base_dir="~/Data/NAREIT", split_method='common')


Once you’ve done that you should be able to read that data from your local storage using quantmod‘s getSymbols:
getSymbols("NAREIT.IDX")
chartSeries(Cl(NAREIT.IDX), theme="white")


That gives output that looks something like this:
            Close Returns
1971-12-31 100.00      NA
1972-01-31 101.22  0.0122
1972-02-29 102.18  0.0095
1972-03-31 102.44  0.0025
1972-04-30 102.70  0.0025
1972-05-31 101.15 -0.0151


NAREIT Price Series

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: ,