FOMC Dates – Scraping Data From Web Pages

[This article was first published on Return and Risk, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Before we can do some quant analysis, we need to get some relevant data – and the web is a good place to start. Sometimes the data can be downloaded in a standard format like .csv files or available via an API e.g. but often you’ll need to scrape data directly from web pages.

In this post I’ll show how to obtain the US Federal Reserve FOMC Announcement dates (i.e. those when a statement is published after the meeting) from their web page At the time of writing, this web page had dates from 2009 onward.

First, install and load the httr and XML R packages.

install.packages(c("httr", "XML"), repos = "")

Next, run the following R code.

# get and parse web page content
webpage <- content(GET(
    as = "text")
xhtmldoc <- htmlParse(webpage)
# get statement urls and sort them
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr,
statements <- sort(statements)
# get dates from statement urls
fomcdates <- sapply(statements, function(x) substr(x, 28, 35))
fomcdates <- as.Date(fomcdates, format = "%Y%m%d")
# save results in working directory
save(list = c("statements", "fomcdates"), file = "fomcdates.RData")

Finally, check the results by looking at their structures and first few values.

# check data

And you should see output similar to this below.

##  chr [1:49] "/newsevents/press/monetary/20090128a.htm" ...
## [1] "/newsevents/press/monetary/20090128a.htm"
## [2] "/newsevents/press/monetary/20090318a.htm"
## [3] "/newsevents/press/monetary/20090429a.htm"
## [4] "/newsevents/press/monetary/20090624a.htm"
## [5] "/newsevents/press/monetary/20090812a.htm"
## [6] "/newsevents/press/monetary/20090923a.htm"
##  Date[1:49], format: "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" ...
## [1] "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" "2009-08-12"
## [6] "2009-09-23"

So what can we do with this data? Here are a few ideas:

  • Go deeper and download the actual statements and use a machine learning algorithm (Natural Language Processing (NLP)) to analyze the statement e.g. positive or negative sentiment. Actually, this is quite a complex task but is something on my list of research topics in 2015…
  • Collect price data e.g. Treasury yields or S&P500 and do some visual / initial exploratory analysis around the FOMC announcement dates
  • Conduct an event study like the academics do to identify whether or not there are any statistically significant patterns around these dates
  • Incorporate the dates into a trading or investment program and backtest to see whether there are economically significant patterns i.e. tradeable alpha opportunities

Click here for the R code on GitHub.

To leave a comment for the author, please follow the link and comment on their blog: Return and Risk. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)