How much faster is calibration with parallel processing and/or R byte-code compiling ?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The Svensson model is a 6-factor yield curve model that has been derived from the Nelson-Siegel model in : Svensson, L. E. (1995). Estimating forward interest rates with the extended Nelson & Siegel method. Sveriges Riksbank Quarterly Review, 3(1) :13-26.
In this post, I compare the performances of R package mcGlobaloptim on the model’s calibration to observed (fictitious) rates.
This will be done in pure R, with sequential or parallel processing, with or without byte-code compilation (shortly, a compact numeric code between source code and machine code, in which the source code is ‘translated’; please refer to this article or this one).
Here are the R packages required for this experiment :
# Support for simple parallel computing in R require(snow) # Global optimization using Monte Carlo and Quasi # Monte Carlo simulation require(mcGlobaloptim) # The R Compiler Package require(compiler) # Sub microsecond accurate timing functions require(microbenchmark) # Benchmarking routine for R require(rbenchmark)
The R Code for the objective function (function to be minimized) used here, appears in : Gilli, Manfred and Schumann, Enrico, A Note on ‘Good Starting Values’ in Numerical Optimisation (June 3, 2010). Available at SSRN: http://ssrn.com/abstract=1620083 or http://dx.doi.org/10.2139/ssrn.1620083
### Calibrating the Nelson-Siegel-Svensson model
### : Svensson model
NSS2 <- function(betaV, mats) {
gam1 <- mats/betaV[5]
gam2 <- mats/betaV[6]
aux1 <- 1 - exp(-gam1)
aux2 <- 1 - exp(-gam2)
betaV[1] + betaV[2] * (aux1/gam1) + betaV[3]
*(aux1/gam1 + aux1 - 1) +
betaV[4] * (aux2/gam2 + aux2 - 1)
}
# True parameters
betaTRUE <- c(5, -2, 5, -5, 1, 3)
# Input maturities
mats <- c(1, 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60
, 72, 84, 96, 108, 120)/12
# Corresponding yield to maturities
yM <- NSS2(betaTRUE, mats)
dataList <- list(yM = yM, mats = mats, model = NSS2)
# Bounds for parameters' search
settings <- list(min = c(0, -15, -30, -30, 0, 3),
max = c(15, 30, 30, 30, 3, 6), d = 6)
# Objective function
OF <- function(betaV, dataList) {
mats <- dataList$mats
yM <- dataList$yM
model <- dataList$model
y <- model(betaV, mats)
aux <- y - yM
crossprod(aux)
}
#
A compiled version of the objective function is then defined, using the package compiler from L. Tierney (now a part of base R) :
# define compiled objective function
OFcmp <- cmpfun(OF)
OFcmp
## function(betaV, dataList) {
## mats <- dataList$mats
## yM <- dataList$yM
## model <- dataList$model
## y <- model(betaV, mats)
## aux <- y - yM
## crossprod(aux)
## }
## bytecode: 0x000000000dfa98b8
The 4 situations being tested are the following :
* Optimization with sequential processing and no byte-code compilation
* Optimization with sequential processing and byte-code compilation
* Optimization with parallel processing and no byte-code compilation
* Optimization with parallel processing and byte-code compilation
Thus, the performance of multiStartoptim, based on the 4 following functions will be tested :
OF_classic <- function(.n) {
multiStartoptim(objectivefn = OF, data = dataList,
lower = settings$min, upper = settings$max,
method = "nlminb", nbtrials = .n,
typerunif = "sobol")
}
OF_cmp <- function(.n) {
multiStartoptim(objectivefn = OFcmp, data = dataList,
lower = settings$min, upper = settings$max,
method = "nlminb",
nbtrials = .n, typerunif = "sobol")
}
OF_classic_parallel <- function(.n) {
multiStartoptim(objectivefn = OF, data = dataList,
lower = settings$min, upper = settings$max,
method = "nlminb",
nbtrials = .n, typerunif = "sobol",
nbclusters = 2)
}
OF_cmp_parallel <- function(.n) {
multiStartoptim(objectivefn = OFcmp, data = dataList, lower = settings$min,
upper = settings$max, method = "nlminb",
nbtrials = .n, typerunif = "sobol",
nbclusters = 2)
}
First of all, it's important to notice that parallel processing is not necessarily faster than sequential processing when nbtrials is low :
nbtrials <- 5
benchmark1 <- benchmark(OF_classic(nbtrials),
OF_classic_parallel(nbtrials),
columns = c("test", "replications",
"elapsed", "relative"),
replications = 10,
relative = "elapsed")
benchmark1
## test replications elapsed relative
## 1 OF_classic(nbtrials) 10 0.38 1.00
## 2 OF_classic_parallel(nbtrials) 10 7.39 19.45
Moreover, a little attention should be paid to :
* The choice of the number of clusters in terms of availability
* More clusters doesn't necessarily mean faster calibration.
Now, increasing the number of trials, we verify first that the results obtained in the 4 cases are all the same :
nbtrials <- 500 t1 <- OF_classic(nbtrials) t2 <- OF_cmp(nbtrials) t3 <- OF_classic_parallel(nbtrials) t4 <- OF_cmp_parallel(nbtrials)
We have :
t1$par ## [1] 5 -2 5 -5 1 3 t1$objective ## [1] 2.795e-25
and :
all.equal(t1$objective, t2$objective, t3$objective, t4$objective) ## [1] TRUE
And now, the final comparison of the 4 cases :
benchmark2 <- microbenchmark(OF_classic(nbtrials), OF_cmp(nbtrials), OF_classic_parallel(nbtrials), OF_cmp_parallel(nbtrials), times = 100) print(benchmark2, unit = "ms", order = "median") ## Unit: milliseconds ## expr min lq median uq max ## OF_cmp_parallel 2697 2753 2787 2883 3194 ## OF_classic_parallel 2849 2925 2971 3061 4121 ## OF_cmp 3712 3753 3809 3860 4430 ## OF_classic 4060 4121 4153 4228 4801
Here is how pretty ggplot2 displays it :
require(ggplot2)
plt <- ggplot2::qplot(y = time, data = benchmark2,
colour = expr, xlab = "execution date",
ylab = "execution time in milliseconds")
plt <- plt + ggplot2::scale_y_log10()
print(plt)
With this being said, a sequential implementation in C code might be much faster than these examples in pure R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
