R caching with financial data

December 22, 2017
By

(This article was first published on R – Artificial thoughts, and kindly contributed to R-bloggers)

In the previous post we looked at a simple data caching example which we used to explore the workings of the R-package DataCache.

In this post we continue with this exploration. Instead of just using system time as the datafeed we now use a more real world example of financial data. This is again in preparation for running a custom Shiny server.

So let’s start.

We start by defining the function get_timeseries  to retrieve and fill stock data.

We put this function inside another function datafeed_timeseries that DataCache can understand.

The line

names(out) = paste0('stockdata.', stock_id)

implies that the actual data is saved in the environment under the name of paste0(‘stockdata.’, stock_id) ie if stockid = ‘ALV.DE’ than the data is saved under stockdata.ALV.DE. For another example see the testing example further below.

So here’s the code:

library(PerformanceAnalytics)
library(DataCache)

get_timeseries = function(stock_id) {
  
  AdjustedPrice = 6
  .stockdata = getSymbols(stock_id, warnings = FALSE, auto.assign = FALSE)
  stockdata = na.fill(.stockdata, fill = "extend")[, AdjustedPrice, drop=FALSE]
  
  return (stockdata)
}

datafeed_timeseries = function(stock_id) {
    
  timeseries = get_timeseries(stock_id)
  out = list(timeseries)
  names(out) = paste0('stockdata.', stock_id)
  
  return(out)
  
}

Now we’ll do several tests to see that the cache actually accelerates loading the data, in this set up in fact it is ~ 100 times faster.

# do timing tests
  varName1 = 'BAS.DE'

# delete the cache (just in case there are any leftovers) 
  junk <- dir(path="~/cache/", pattern=varName1, full.names = TRUE) # ?dir
  file.remove(junk) # ?file.remove
  
# first time
  start_time <- Sys.time()
  cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE)
  end_time <- Sys.time()

  timeTaken1 = end_time - start_time
  # Time difference of 0.3825941 secs

# second time
  start_time <- Sys.time()
  cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE)
  end_time <- Sys.time()
  
  timeTaken2 = end_time - start_time
  # Time difference of 0.003516197 secs

# this is the actual data
  tail(stockdata.BAS.DE)
  # BAS.DE.Adjusted
  # 2017-12-14           93.75
  # 2017-12-15           93.67
  # 2017-12-18           95.46
  # 2017-12-19           94.28
  # 2017-12-20           93.13
  # 2017-12-21           93.69

# delete the cache 
  junk <- dir(path="~/cache/", pattern=varName1, full.names = TRUE) # ?dir
  file.remove(junk) # ?file.remove

# third time
  start_time <- Sys.time()
  cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE)
  end_time <- Sys.time()
  
  timeTaken3 = end_time - start_time
  # Time difference of 0.4717042 secs

# fourth time
  start_time <- Sys.time()
  cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE)
  end_time <- Sys.time()
  
  timeTaken4 = end_time - start_time
  # Time difference of 0.003220558 secs

  # so retrieving the cache is significantly faster ...
  # we assert that it is more the 50 times faster
  assertthat::assert_that(timeTaken1>timeTaken2 * 50)
  assertthat::assert_that(timeTaken3>timeTaken4 * 50)
  
  # # in fact in this set up it is around 100 
  # as.numeric(timeTaken1)/as.numeric(timeTaken2)
  # # [1] 146.3878
  # as.numeric(timeTaken3)/as.numeric(timeTaken4)
  # # [1] 94.27676

Hurray, it works!

To leave a comment for the author, please follow the link and comment on their blog: R – Artificial thoughts.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)