# An R vlookup? Not so silly idea

**Digital Age Economist on Digital Age Economist**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It started out as a joke, but Jenny Bryan recently posted a `vlookup`

implementation in `R`

. Here is the original post as seen on twitter:

Sometimes you just need to party like it’s VLOOKUP time ?? … seriously, sometimes a join doesn’t fit the bill pic.twitter.com/jz8StfQdNg

— Jenny Bryan (@JennyBryan) April 3, 2018

The argument for the creation of this kind of lookup implementation in `R`

, was to help facilitate a join that wasn’t covered by the standard joins. This tweet started out as a joke function that would surely never see the daylight of production code, but there was one thing that stood out about the small little function – it was written in `base`

R and made use of R’s list data objects. What makes this intriguing you ask? Well, combining `lists`

and `base`

usually has a nice consequence – speed.

So that is what I set out to test: How much more efficient would this `vlookup`

be when compared to a `dplyr`

or `data.table`

implementation? We start off by recreating the vlookup function and use the tweet’s example to show its use case:

library(tidyverse) vlookup <- function(this, df, key, value) { m <- match(this, df[[key]]) df[[value]][m] } vlookup_base <- function(){ c("Luke Skywalker", "Jabba Desilijic Tiure", "Yoda") %>% vlookup(starwars, "name", "mass") %>% sum } vlookup_base() ## [1] 1452

Next we build the same query, but using `dplyr`

notation to give us the same results (a quick check shows us the sum results are the same):

vlookup_dplyr <- function(){ starwars %>% filter(name %in% c("Luke Skywalker", "Jabba Desilijic Tiure", "Yoda")) %>% pull(mass) %>% sum } vlookup_dplyr() ## [1] 1452

Lastly we build the query using `data.table`

. This implementation comes from the comment section of the tweet. We first have to coerce the `starwars`

data into a data.table object for this to work. We also lose the `%>%`

workflow which can be quite a pain. In this case it doesn’t matter, because we all about speed!

library(data.table) vlookup_dt <- function(){ name_vec <- c("Luke Skywalker", "Jabba Desilijic Tiure", "Yoda") df <- data.table(dplyr::starwars) df[.(name_vec), on = 'name', sum(mass)] }

To test the speed to all of these function, we will use the `microbenchmark`

library. Its one of my favourite libraries in `R`

,thanks to its ease of use and quick API to comparing functions.

library(microbenchmark) microbenchmark( base = vlookup_base(), dplyr = vlookup_dplyr(), data_table = vlookup_dt(), times = 1000L ) ## Unit: microseconds ## expr min lq mean median uq max neval ## base 93.539 116.1030 143.8219 129.642 147.692 4797.951 1000 ## dplyr 780.309 889.0265 1017.7738 943.795 1035.898 8516.107 1000 ## data_table 1030.564 1166.3595 1454.0140 1252.103 1441.641 50518.175 1000 ## cld ## a ## b ## c

Here we see the enormous speed gain we got from using `vlookup_base`

in comparison with the other two frameworks. In all fairness, I feel that I might be handicapping `data.table`

a bit, just to the coercion of the `starwars`

dataset each time. So lets see what happens when I do the coercion outside the function.

df <- data.table(dplyr::starwars) vlookup_dt <- function(){ name_vec <- c("Luke Skywalker", "Jabba Desilijic Tiure", "Yoda") df[.(name_vec), on = 'name', sum(mass)] } res_mic <- microbenchmark( base = vlookup_base(), dplyr = vlookup_dplyr(), data_table = vlookup_dt(), times = 1000L ) res_mic ## Unit: microseconds ## expr min lq mean median uq max neval cld ## base 95.590 118.975 137.5330 128.410 140.1030 603.077 1000 a ## dplyr 784.410 873.026 962.0909 911.180 972.7185 3840.002 1000 c ## data_table 579.693 648.206 740.3640 685.129 746.0520 4667.900 1000 b

We definitely gained some speed by not having to coerce the data.frame over and over, but, the overall speed comparison is still nothing like the old fashion `vlookup_base`

.

**leave a comment**for the author, please follow the link and comment on their blog:

**Digital Age Economist on Digital Age Economist**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.