Quandl and Forecasting

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to another installment of Reproducible Finance with R. Today we are going to shift focus in recognition of the fact that there’s more to Finance than stock prices, and there’s more to data download than quantmod/getSymbols. In this post, we will explore commodity prices using data from Quandl, a repository for both free and paid data sources. We will also get into the forecasting game a bit and think about how best to use dygraphs when visualizing predicted time series as an extension of historical data. We are not going to do anything too complex, but we will expand our toolkit by getting familiar with Quandl, commodity prices, the forecast() function, and some advanced dygraph work.

Before we dive in, a few thoughts to frame the notebook underlying this post.

  • We are using oil data from Quandl, but the original data is from FRED. There’s nothing wrong with grabbing the data directly from FRED, of course, and I browse FRED frequently to check out economic data, but I tend to download the data into my RStudio environment using Quandl. I wanted to introduce Quandl today because it’s a nice resource that will be involved in the next few posts in this series. Plus, it’s gaining in popularity, and if you work in the financial industry, you might start to encounter it in your work.

  • This post marks our first foray into the world of predictive modeling, albeit in a very simple way. But the complexity and accuracy of the forecasting methodology we use here is almost irrelevant since I expect that most R coders, whether in industry or otherwise, will have their own proprietary models. Rather, what I want to accomplish here is a framework where models can be inserted, visualized, and scrutinized in the future. I harp on reproducible workflows a lot, and that’s not going to change today because one goal of this Notebook is to house a forecast that can be reproduced in the future (at which point, we will know if the forecast was accurate or not), and then tweaked/criticized/updated/heralded.

  • This post walks through a detailed example of importing, forecasting, and visualizing oil prices. In the near future, I will repeat those steps for gold and copper, and we will examine the relationship between the copper/gold price ratio and interest rates. We are starting simple, but stay tuned.

Now, let’s get to the data download! In the chunk below, as we import WTI oil prices, notice that Quanld makes it easy to choose types of objects (raw/dataframe, xts, or zoo), periods (daily, weekly, or monthly) and start/end dates.

library(Quandl)
library(dplyr)
library(xts)
library(lubridate)
library(forecast)
library(dygraphs)

# Start with daily data. Note that "type = raw" will download a data frame.
oil_daily <- Quandl("FRED/DCOILWTICO", type = "raw", collapse = "daily",  
                    start_date="2006-01-01", end_date="2017-02-28")
# Now weekely and let's use xts as the type.
oil_weekly <- Quandl("FRED/DCOILWTICO", type = "xts", collapse = "weekly",  
                     start_date="2006-01-01", end_date="2017-02-28")
# And monthly using xts as the type.
oil_monthly <- Quandl("FRED/DCOILWTICO", type = "xts", collapse = "monthly",  
                      start_date="2006-01-01", end_date="2017-02-28")

# Have a quick look at our three  objects. 
str(oil_daily)
## 'data.frame':    2809 obs. of  2 variables:
##  $ DATE : Date, format: "2017-02-28" "2017-02-27" ...
##  $ VALUE: num  54 54 54 54.5 53.6 ...
##  - attr(*, "freq")= chr "daily"
str(oil_weekly)
## An 'xts' object on 2006-01-08/2017-03-05 containing:
##   Data: num [1:583, 1] 64.2 63.9 68.2 67.8 65.4 ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
##  NULL
str(oil_monthly)
## An 'xts' object on Jan 2006/Feb 2017 containing:
##   Data: num [1:134, 1] 67.9 61.4 66.2 71.8 71.4 ...
##   Indexed by objects of class: [yearmon] TZ: 
##   xts Attributes:  
##  NULL

Also note that I specified the end date of February 2017. This indicates that the Notebook houses a model that was built and run using data as of February 2017. Without the end date, this Notebook would house a model that was built and run using data as of time t. Which you choose depends how you want the Notebook to function for your team.

Looking back at those oil price objects, each would work well for the rest of this project, but let’s stick with the monthly data. We will be dealing with the date index quite a bit below, so let’s use the seq() function and mdy() from the lubridate package to put the date into a nicer format.

index(oil_monthly) <- seq(mdy('01/01/2006'), mdy('02/01/2017'), by = 'months')
head(index(oil_monthly))
## [1] "2006-01-01" "2006-02-01" "2006-03-01" "2006-04-01" "2006-05-01"
## [6] "2006-06-01"

Now we have a cleaner date format. Our base data object is in good shape. As always, we like to have a look at the data in graphical format, so let’s fire up dygraphs. Since we imported an xts object directly from Quandl, we can just plug it straight into the dygraph() function.

dygraph(oil_monthly, main = "Monthly oil Prices")

Alright, nothing too shocking here. We see a peak in mid-2008, followed by a precipitous decline through the beginning of 2009.

Now we’ll make things a bit more interesting and try to extract some meaning from that data. Let’s use the forecast() function to predict what oil prices will look like over the next six months. This is the part of the code where you might want to insert whatever model your team has built or wish to test. We can think of it as a placeholder for any proprietary model or models that could be dropped into this Notebook. For our purposes, we will simply pass in the monthly oil prices object and supply a lookahead parameter of 6. The forecast() function will then supply some interesting numbers about the next six months of oil prices.

oil_6month <- forecast(oil_monthly, h = 6)

# Let's have a quick look at the 6-month forecast and the 80%/95% confidence levels. 
oil_6month
##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 135       53.99987 47.53908 60.46067 44.11894 63.88081
## 136       53.99987 44.84351 63.15624 39.99642 68.00333
## 137       53.99987 42.76141 65.23834 36.81212 71.18763
## 138       53.99987 40.99460 67.00515 34.11002 73.88973
## 139       53.99987 39.42782 68.57193 31.71385 76.28590
## 140       53.99987 38.00211 69.99764 29.53340 78.46635

The mean forecast is right around $54. It looks like the 95% confidence level has a high of $78 in August and a low of $29 in March. We won’t dwell on these numbers because I imagine you will want to use your own model here - this Notebook is more of a skeleton where those models can be inserted and then tested or evaluated at a later date.

Let’s move on to visualizing the results of the forecast along with the historical data. The base plot() function does a decent job here.

plot(oil_6month, main = "Oil Forecast")

That plot looks OK, but it’s not great. We can see that the mean forecast is to stay around $50, with the 95% bands stretching all the way to around $80 and $30, but honestly, I have to squint to really see those 95% intervals. We don’t like squinting, so let’s put in some extra work to make use of dygraphs, which will have the benefit of allowing a reader to zoom on the predicted portion of the graph.

This is where things require a bit more thought. We want one xts object to hold both the historical data and the forecasted prices.

We already have our monthly prices in the xts object we imported from Quandl, but the forecasted prices are currently in a list with a different date convention than we would like.

First, let’s move the mean forecast and 95% confidence bands to a dataframe, along with a date column. We predicted oil out six months, so we will need a date column for the six months after February.

oil_forecast_data <- data.frame(date = seq(mdy('03/01/2017'), 
                                           by = 'months', length.out = 6),
                                Forecast = oil_6month$mean,
                                Hi_95 = oil_6month$upper[,2],
                                Lo_95 = oil_6month$lower[,2])

head(oil_forecast_data)
##         date Forecast    Hi_95    Lo_95
## 1 2017-03-01 53.99987 63.88081 44.11894
## 2 2017-04-01 53.99987 68.00333 39.99642
## 3 2017-05-01 53.99987 71.18763 36.81212
## 4 2017-06-01 53.99987 73.88973 34.11002
## 5 2017-07-01 53.99987 76.28590 31.71385
## 6 2017-08-01 53.99987 78.46635 29.53340

The data we want is now housed in its own dataframe. Let’s convert that to an xts object.

oil_forecast_xts <- xts(oil_forecast_data[,-1], order.by = oil_forecast_data[,1])

Now we can combine the historical xts object with the forecasted xts object using cbind().

# Combine the xts objects with cbind.

oil_combined_xts <- cbind(oil_monthly, oil_forecast_xts)

# Add a nicer name for the first column.

colnames(oil_combined_xts)[1] <- "Actual"

# Have a look at both the head and the tail of our new xts object. Make sure the
# NAs are correct.
head(oil_combined_xts)
##            Actual Forecast Hi_95 Lo_95
## 2006-01-01  67.86       NA    NA    NA
## 2006-02-01  61.37       NA    NA    NA
## 2006-03-01  66.25       NA    NA    NA
## 2006-04-01  71.80       NA    NA    NA
## 2006-05-01  71.42       NA    NA    NA
## 2006-06-01  73.94       NA    NA    NA
tail(oil_combined_xts)
##            Actual Forecast    Hi_95    Lo_95
## 2017-03-01     NA 53.99987 63.88081 44.11894
## 2017-04-01     NA 53.99987 68.00333 39.99642
## 2017-05-01     NA 53.99987 71.18763 36.81212
## 2017-06-01     NA 53.99987 73.88973 34.11002
## 2017-07-01     NA 53.99987 76.28590 31.71385
## 2017-08-01     NA 53.99987 78.46635 29.53340

It looks as it should. From January 2006 to February 2017 we have our actual data and NAs for the forecasted data. From March 2017 to August 2017, we have our mean forecast and 95% confidence levels and NAs for actual data. Said another way, we have four time series with observations at different dates, some of which are in the future. Most fortunately, dygraph provides a nice way to plot our actual time series versus our three forecasted time series because it simply does not plot the NAs.

dygraph(oil_combined_xts, main = "Oil Prices: Historical and Forecast") %>%
  # Add the actual series
  dySeries("Actual", label = "Actual") %>%
  # Add the three forecasted series
  dySeries(c("Lo_95", "Forecast", "Hi_95"))

Take a quick look back at our previous graph using the plot() function. At first glance, the dygraph might not seem so different. But, we can now make use of hovering/tooltips and, more importantly, we can zoom in on the forecasted numbers see them much more clearly. Plus, the whole world of dygraph functionality is now available to us!

That’s all for today. We have gotten some familiarity with Quandl, used forecast() to predict the next six months of oil prices, and done some data wrangling so we can use our old friend dygraphs. Next time, we will wrap this into a Shiny app so that users can choose their own parameters, and maybe even choose different commodities. See you then!

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)