Our previous post showed how to speed up the conversion of IPv4 addresses to/from integer format by taking advantage of a simple
Rcpp wrapper to “boosted” native functions. However, to convert more than one IP address, you need to stick those functions into one of the R
*apply functions, which does the job, but is not an optimal solution. Ideally, it would be advantageous to be able to pass in a vector (with more than one element) of character IP addresses or a vector of integer format IP addresses and know that the function will “just work”.
In this post we’ll introduce a shortcut method of vectorization with the
Vectorize() function. Then, in the second and final part of the series, we’ll look at implementing the necessary code at the
Rcpp layer to perform the vectorization at the C++-level and show some benchmarks for each method.
The Vectorize() Shortcut
At the end of our previous exercise, we had two functions:
rinet_ntop(). Each took a single argument (the former a single element character vector and the latter a single element numeric vector) and returned a single element vector as a result. Let’s vectorize each one using the
# the following code assumes you've already done the "sourceCpp" in the prev article ip_to_long <- Vectorize(rinet_pton) long_to_ip <- Vectorize(rinet_ntop)
Yes, that’s all it takes. Now we can pass in a vector of one or more elements and each function will return a vector of the same size as a result. The proof is in the output, so let’s give them a go, first with the original single-element vector use case:
# try a single IP address first ip_to_long("10.0.0.0") ## 10.0.0.0 ## 167772160 long_to_ip(167772160) ##  "10.0.0.0"
So far, so good except that the default behavior (in
Vectorize()) of producing a named vector when a character vector is passed in is probably not what we really want, so we’ll tweak the call to
Vectorize() for each function:
ip_to_long <- Vectorize(rinet_pton, USE.NAMES=FALSE) long_to_ip <- Vectorize(rinet_ntop, USE.NAMES=FALSE) ip_to_long("10.0.0.0") ##  167772160 long_to_ip(167772160) ##  "10.0.0.0"
Now, let’s test it with more than one element:
srcIp <- c("220.127.116.11", "18.104.22.168", "22.214.171.124", "126.96.36.199", "188.8.131.52", "184.108.40.206", "220.127.116.11", "18.104.22.168", "22.214.171.124", "126.96.36.199") srcInt <- c(2461153891, 2919607448, 2461153891, 3585747464, 2461153891, 2861209742, 2861209742, 2919607448, 2461153891, 3585747464) ip_to_long(srcIp) ##  2461153891 2919607448 2461153891 3585747464 2461153891 2861209742 ##  2861209742 2919607448 2461153891 3585747464 long_to_ip(srcInt) ##  "188.8.131.52" "184.108.40.206" "220.127.116.11" ##  "18.104.22.168" "22.214.171.124" "126.96.36.199" ##  "188.8.131.52" "184.108.40.206" "220.127.116.11" ##  "18.104.22.168"
Everything works as expected and we can now use those conversion routines without resorting to
Exercise For the reader!
To see what
Vectorize()does under the covers, just enter
long_to_ipat an R console prompt without the parenthesis. This will show the source of the functions that
Vectorize()built. Try to build your own vectorized versions by trimming down what’s in the generated source code.
We’ll see how to perform the same vectorization task at the
Rcpp level in the next post and put each version in a head-to-head benchmark test. NOTE: Using
Rcpp with R markdown takes some extra steps, and I’ve posted a gist that shows some of the options you need to set to ensure the
Rcpp code compiles and links properly and also the wicked-cool way you can embed
Rcpp code right in markdown documents.