Storing Forecasts in a Database

[This article was first published on Quintuitive » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last post I mentioned that I started using RSQLite to store computed results. No rocket science here, but my feeling is that this might be useful to others, hence, this post. This can be done using any database, but I will use (R)SQLite as an illustration.

Let’s assume we are running a long ARMA/GARCH simulation (see here for an example) on daily data on the S&P 500. For each day, we fit numerous models using our package of choice. We need to store all useful information in a database and use it later for various analysis. A reasonable approach is to create two tables. One to store the models, and one to store the forecasts. Here is the relevant RSQLite code:

require(RSQLite)

driver = dbDriver("SQLite")
connection = dbConnect(driver, dbname=db.path)

query = paste(
           " create table if not exists models ( ",
           "    history integer not null, ",
           "    p integer not null, ",
           "    q integer not null, ",
           "    r integer not null, ",
           "    s integer not null, ",
           "    dist varchar(32) not null)",
           sep="")
dbGetQuery(connection, query)

query = paste(
           " create unique index if not exists models_unique ",
           "    on models(history,p,q,r,s,dist) ",
           sep="")
dbGetQuery(connection, query)
      
query = paste(
           " create table if not exists forecasts ( ",
           "    model integer not null, ",
           "    date datetime not null, ",
           "    mu real, ",
           "    sigma real, ",
           "    ic real) ",
           sep="")
dbGetQuery(connection, query)
      
query = paste(
           " create unique index if not exists forecasts_unique ",
           "    on forecasts(model,date) ",
           sep="")
dbGetQuery(connection, query)

dbDisconnect(connection)

The models table has one row for each model, which is a unique combination of (history,p,q,r,s,distribution). Each forecast refers to its model (this way we avoid repetitions – a standard database normalization) and specifies its date. The other columns of the forecasts can be pretty much anything that’s needed. In the example, we save the forecasts for the mean and variance and the information criteria (AIC for instance). The update code is a bit more complicated (or at least in my implementation):

# The assumption is that we are in a context (function) containing the
# parameters specifying the model, i.e. they are available via the
# history,p,q,r,s and dist variables. We also assume that the forecast(s)
# are contained in fore.df.

driver = DBI::dbDriver("SQLite")
connection = DBI::dbConnect(driver, dbname=db.path)

# First add the model if it doesn't exist already
model.df = data.frame(history, p, q, r, s, dist)
colnames(model.df) = c("history", "p", "q", "r", "s", "dist")
query = paste(
           " insert or ignore into models (history, p, q, r, s, dist) ",
           "    values (@history, @p, @q, @r, @s, @dist) ",
           sep="")
RSQLite::dbGetPreparedQuery(connection, query, bind.data=model.df)

# Get the model id from the rowid column (added automatically by SQLite)
query = paste("select rowid from models where history=? and p=? and q=? and r=? and s=? and dist=?", sep="")
rowid = as.numeric(RSQLite::dbGetPreparedQuery(connection, query, bind.data=model.df))

# Set the model id in the forecast data frame
fore.df[,1] = rowid
colnames(fore.df) = c("model", "date", "mu", "sigma", "ic")

DBI::dbBegin(connection)
# Insert the forecasts
query = paste(
           " insert or replace into forecasts ( ",
           "    model, date, mu, sigma, ic) ",
           " values (@model, @date, @mu, @sigma, @ic) ",
           sep="")
RSQLite::dbGetPreparedQuery(connection, query, bind.data=fore.df)
DBI::dbCommit(connection)

DBI::dbDisconnect(connection)

In summary, insert the model if not already available, get the unique id for the model (using SQLite’s rowid unique column added automatically to each table), add the model id to the forecasts data frame and finally insert the forecasts data frame into the forecasts table.

I have been using this approach for a while and has been quite happy with its usefulness.

Last but not least, if you are performing the simulation in parallel, the update must be wrapped in a critical section (for more details, check my previous post and the flock package).

To leave a comment for the author, please follow the link and comment on their blog: Quintuitive » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)