FOMC Dates – Scraping Data From Web Pages

November 30, 2014
By

(This article was first published on Return and Risk, and kindly contributed to R-bloggers)

Before we can do some quant analysis, we need to get some relevant data – and the web is a good place to start. Sometimes the data can be downloaded in a standard format like .csv files or available via an API e.g. http://www.quandl.com but often you’ll need to scrape data directly from web pages.

In this post I’ll show how to obtain the US Federal Reserve FOMC Announcement dates (i.e. those when a statement is published after the meeting) from their web page http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm. At the time of writing, this web page had dates from 2009 onward.

First, install and load the httr and XML R packages.

install.packages(c("httr", "XML"), repos = "http://cran.us.r-project.org")
library(httr)
library(XML)

Next, run the following R code.

# get and parse web page content
webpage <- content(GET(
"http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm"),
as = "text")
xhtmldoc <- htmlParse(webpage)
# get statement urls and sort them
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr,
"href")
statements <- sort(statements)
# get dates from statement urls
fomcdates <- sapply(statements, function(x) substr(x, 28, 35))
fomcdates <- as.Date(fomcdates, format = "%Y%m%d")
# save results in working directory
save(list = c("statements", "fomcdates"), file = "fomcdates.RData")

Finally, check the results by looking at their structures and first few values.

# check data
str(statements)
head(statements)
str(fomcdates)
head(fomcdates)

And you should see output similar to this below.

##  chr [1:49] "/newsevents/press/monetary/20090128a.htm" ...
## [1] "/newsevents/press/monetary/20090128a.htm"
## [2] "/newsevents/press/monetary/20090318a.htm"
## [3] "/newsevents/press/monetary/20090429a.htm"
## [4] "/newsevents/press/monetary/20090624a.htm"
## [5] "/newsevents/press/monetary/20090812a.htm"
## [6] "/newsevents/press/monetary/20090923a.htm"
## Date[1:49], format: "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" ...
## [1] "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" "2009-08-12"
## [6] "2009-09-23"

So what can we do with this data? Here are a few ideas:

  • Go deeper and download the actual statements and use a machine learning algorithm (Natural Language Processing (NLP)) to analyze the statement e.g. positive or negative sentiment. Actually, this is quite a complex task but is something on my list of research topics in 2015…
  • Collect price data e.g. Treasury yields or S&P500 and do some visual / initial exploratory analysis around the FOMC announcement dates
  • Conduct an event study like the academics do to identify whether or not there are any statistically significant patterns around these dates
  • Incorporate the dates into a trading or investment program and backtest to see whether there are economically significant patterns i.e. tradeable alpha opportunities

Click here for the R code on GitHub.

To leave a comment for the author, please follow the link and comment on their blog: Return and Risk.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)