Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values.

Of course, sometimes it takes a while to figure out how to do this. Please read for a great R matrix lookup problem and solution.

In R we can specify operations over vectors. For arithmetic this is easy, but some more complex operations you “need to know the trick.”

Patrick Freeman (@PTFreeman) recently asked: what is the idiomatic way to look up a bunch of values from a matrix by row and column keys? This is actually easy to do if we first expand the data matrix into RDF-triples. If our data were in this format we could merge/join it against our desired column indices.

# example matrix data
m <- matrix(1:9, nrow = 3)
row.names(m) <- c('R1' ,'R2', 'R3')
colnames(m) <- c('C1', 'C2', 'C3')
knitr::kable(m)
C1 C2 C3
R1 1 4 7
R2 2 5 8
R3 3 6 9

And our data-frame containing the indices we want to look-up.

# row/columns we want
w <- data.frame(
i = c('R1', 'R2', 'R2'),
j = c('C2', 'C3', 'C2'))
knitr::kable(w)
i j
R1 C2
R2 C3
R2 C2

That is: we want to know the matrix values from [R1, C2], [R2, C3], and [R2, C2].

The trick is: how do we convert the matrix into triples? digEmAll, has a great solution to that here.

# unpack into 3-column format from:
# https://stackoverflow.com/a/9913601
triples <- data.frame(
i = rep(row.names(m), ncol(m)),
j = rep(colnames(m), each = nrow(m)),
v = as.vector(m))
knitr::kable(triples)
i j v
R1 C1 1
R2 C1 2
R3 C1 3
R1 C2 4
R2 C2 5
R3 C2 6
R1 C3 7
R2 C3 8
R3 C3 9

What the above code has done is: write each entry of the original matrix as a separate row with the original row and column ids landed as new columns. This data format is very useful.

The above code is worth saving as a re-usable snippet, as getting it right is a clever step.

Now we can vectorize our lookup using the merge command, which produces a new joined table where the desired values have been landed as a new column.

res <- merge(w, triples, by = c('i', 'j'), sort = FALSE)
knitr::kable(res)
i j v
R1 C2 4
R2 C3 8
R2 C2 5

And that is it: we have used vectorized and relational concepts to look up many values from a matrix very quickly.