Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the post (https://statcompute.wordpress.com/2018/09/15/how-to-avoid-for-loop-in-r), I briefly introduced the idea of vectorization and potential use cases. One might be wondering why we even need the Vectorize() function given the fact that it is just a wrapper and whether there is any material efficiency gain by vectorizing a function. It is true that the Vectorize() function is not able to improve the efficiency of any function itself that is wrapped around, e.g. vectorized. However, the vectorization can change the input format of a function that normally consumes scalar inputs before being vectorized and therefore would improve the processing efficiency. An example is given below to demonstrate the value of vectorization.

When we want to locate the index of a value within the long vector with millions of rows, the which() function should be the fastest, e.g. "which((0:100) == 10)". When we want to locate indices of several values within the vector, the match() function might be the most intuitive, e.g. "match(c(10, 12), 0:100)". If we would like to take advantage of the speed offered by the which() function, then we might consider one of the following:
A. Using the “%in%” operator within the which() function such as "which(0:100 %in% c(10, 12))", where “%in%” is the shorthand of the match() function.
B. Parsing out each lookup value and then connecting them by “|” operators such as "which(eval(parse(text = paste('0:100 == ', c(10, 12), collapse= '|'))))".

Besides the two above, we can also leverage the idea of MapReduce discussed in https://statcompute.wordpress.com/2018/09/08/playing-map-and-reduce-in-r-subsetting such as "Reduce(c, Map(function(x) which((0:100) == x), c(10, 12)))".

However, since the Vectorize() function is able to change the input format from a scalar to a vector, we can now consider vectorizing the which() function, which would consume the vector directly such as "(Vectorize(function(s, l) which(l == s), 's')) (c(10, 12), 0:100)". In this newly defined function, there are two parameters, e.g. “s” and “l”, of which “s” is the input changing from a scalar to a vector after the vectorization.

With all ideas on the table, a benchmark comparison is presented below to show how fast to look up 5 values from a vector with a million rows by using each above-mentioned approach. Additionally, since it is straightforward to extend the idea of Parallelism to MapReduce and vectorization, we will add two parallel solutions in the benchmark, including the parallel::pvec() function that executes the vectorization in parallel and the parallel::mcMap() function that is considered the parallelized Map() function.

tbl <- 0:1000000
lst <- 10 ** (1:5)

str_which <- function(s, l) which(eval(parse(text = paste('l == ', s, collapse=  '|'))))

map_which <- function(s, l) Reduce(c, Map(function(x) which(l == x), s))

vec_which <- Vectorize(function(s, l) which(l == s), 's')

mcmap_which <- function(s, l) Reduce(c, parallel::mcMap(function(x) which(l == x), s, mc.cores = length(s)))

mcvec_which <- function(s, l) parallel::pvec(s, mc.cores = length(s), function(x) which(l == x))

rbenchmark::benchmark(replications = 1000, order = "user.self", relative = "user.self",
columns = c("test", "relative", "elapsed", "user.self", "sys.self", "user.child", "sys.child"),
match = {match(lst, tbl)},
which = {which(tbl %in% lst)},
str_which = {str_which(lst, tbl)},
vec_which = {vec_which(lst, tbl)},
map_which = {map_which(lst, tbl)},
mcvec_which = {mcvec_which(lst, tbl)},
mcmap_which = {mcmap_which(lst, tbl)}
)
#        test relative elapsed user.self sys.self user.child sys.child
# mcvec_which    1.000  25.296     1.722   12.477     33.191    30.004
# mcmap_which    1.014  25.501     1.746   12.424     34.231    30.228
#   map_which    9.642  18.240    16.604    1.635      0.000     0.000
#   vec_which    9.777  18.413    16.836    1.576      0.000     0.000
#       which   12.130  22.060    20.888    1.171      0.000     0.000
#   str_which   13.467  25.355    23.191    2.164      0.000     0.000
#       match   36.659  64.114    63.126    0.986      0.000     0.000


With no surprise, both parallel solutions are at least 10 times faster than any single-core solution in terms of the user CPU time. It is also intriguing to see that the vectorization is as efficient as the MapReduce no matter with a single core or multiple cores and is significantly faster than first three approaches shown early and that the match() function, albeit simple, is the slowest, which in turn justifies efforts on vectorizing the which() function.