**Thinking inside the box**, and kindly contributed to R-bloggers)

This blog, together with Romain’s, is one of the main homes of stories about how

Rcpp can help with getting code to run faster in the context of the

R system for statistical programming and analysis. By making it easier to get already existing C or C++ code

to R, or equally to extend R with new C++ code, Rcpp can help in *getting stuff done*.

And it is often fairly straightforward to do so.

In this context, I have a nice new example. And for once, it is work-related. I generally cannot share too much of what we do there as this is, well,

proprietary, but I have this nice new example. The other day, I was constructing (large) time series of implied volatilities. Implied volatilities

can be thought of as the complement to an option’s price: given a price (and all other observables which can be thought of as fixed), we compute an

implied volatility price (typically via the standard Black-Scholes model).

Given a changed implied volatility, we infer a new price — see this

Wikipedia page for more details. In essence, it opens the door to all sorts of arbitrage and relative value pricing adventures.

Now, we observe prices fairly frequently to create somewhat sizeable time series of option prices. And each price corresponds to one matching implied

volatility, and for each such price we have to solve a small and straightforward optimization problem: to compute the implied volatility given the

price. This is usually done with an iterative root finder.

The problem comes from the fact that we have to do this (i) over and over and over for large data sets, and (ii) that there are a number of callbacks

from the (generic) solver to the (standard) option pricer.

So our first approach was to just call the corresponding function `GBSVolatility`

from the

fOption package from the trusted

Rmetrics project by Diethelm Wuertz et al. This worked fine, but even with the usual tricks of splitting over

multiple cores/machines, it simply took too long for the resolution and data amount we desired. One of the problems is that this function (which

uses the proper `uniroot`

optimizer in R) is not inefficient per se, but simply makes to many function call back to the option pricer as

can be seen from a quick glance at the code. The helper function `.fGBSVolatility`

gets called time and time again:

R> GBSVolatility function (price, TypeFlag = c("c", "p"), S, X, Time, r, b, tol = .Machine$double.eps, maxiter = 10000) { TypeFlag = TypeFlag[1] volatility = uniroot(.fGBSVolatility, interval = c(-10, 10), price = price, TypeFlag = TypeFlag, S = S, X = X, Time = Time, r = r, b = b, tol = tol, maxiter = maxiter)$root volatility } <environment: namespace:fOptions> R> R> .fGBSVolatility function (x, price, TypeFlag, S, X, Time, r, b, ...) { GBS = GBSOption(TypeFlag = TypeFlag, S = S, X = X, Time = Time, r = r, b = b, sigma = x)@price price - GBS } <environment: namespace:fOptions>

So the next idea was to try the corresponding function from my RQuantLib package which

brings (parts of) QuantLib to R. That was seen as been lots faster already. Now, QuantLib is pretty big and so

is RQuantLib, and we felt it may not make sense to install it on a number of machines just for this simple problem. So one evening this week I

noodled around for an hour or two and combined (i) a basic Black/Scholes calculation and (ii) a standard univariate zero finder (both of which can be

found or described in numerous places) to minimize the difference between the observed price and the price given an implied volatility. With about

one hundred lines in C++, I had something which felt fast enough. So today I hooked this into R via a two-line wrapper in quickly-created package using

Rcpp.

I had one more advantage here. For our time series problem, the majority of the parameters (strike, time to maturity, rate, …) are fixed, so we can

structure the problem to be vectorised right from the start. I cannot share the code or more the details of my new implementation. However, both

`GBSVolatility`

and `EuropeanOprionImpliedVolatility`

are on CRAN (and as I happen to maintain these for

Debian, also just one `sudo apt-get install r-cran-foptions r-cran-rquantlib`

away if you’re

on Debian or Ubuntu). And writing the other solver is really not that

involved.

Anyway, here is the result, courtesy of a quick run via the rbenchmark package. We create a vector of length

500; the implied volatility computation will be performed at each point (and yes, our time series are much longer indeed). This is replicated 100

times (as is the default for rbenchmark) for each of the three approaches:

[email protected]:~$ r xxxxR/packages/xxxxOptions/demo/timing.R test replications elapsed relative user.self sys.self user.child sys.child 3 zzz(X) 100 0.038 1.000 0.040 0.000 0 0 2 RQL(X) 100 3.657 96.237 3.596 0.060 0 0 1 fOp(X) 100 448.060 11791.053 446.644 1.436 0 0 [email protected]:~$

The new local solution is denoted by `zzz(X)`

. It is already orders of magnitude faster than the `RQL(x)`

function using

RQuantLib (which is, I presume, due to my custom solution internalising the loop). And

the new approach is a laughable amount faster than the basic approach (shown as `fOp`

) via

fOptions. For one hundred replications of solving implied volatilities for

all elements of a vector of size 500, the slow solution takes about 7.5 minutes — while the fast solution takes 38 milliseconds. Which comes to a

relative gain of over 11,000.

So sitting down with your C++ compiler to craft a quick one-hundred lines, combining two well-known and tested methods, can reap sizeable benefits. And Rcpp makes it trivial to call this from R.

**leave a comment**for the author, please follow the link and comment on their blog:

**Thinking inside the box**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...