Nested loops with mapply
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
So as I sink deeper into the second level of R enlightenment, one thing troubled me. “lapply” is fine for looping over a single vector of elements, but it doesn’t do a nested loop structure. These tend to be pretty ubiquitous for me. I’m forever doing the same thing to a set of two or three different variables. “apply ” smells like a logical candidate, but it will really only allow to you to do the same operation over a set of vectors. Meh. “tapply” is more of the same, but applies over a “ragged” array. But “mapply” fits the bill. As it turns out, using mapply is incredibly easy. I found that the trickiest thing to implement is the logic to create a set of all possible combinations over which I want to loop.
Let’s look at that first. Say that you have three variables. To keep things simple, each one is a two-dimensional character vector as below.
a = c("A", "B") b = c("L", "M") c = c("X", "Y")
I poked around for a function that would easily render the Cartesian product of those three vectors. Interaction seemed like a natural choice, but it seems as though it wants to work with factors and my first attempts to use it returned an error which had something to do with the number of elements. Diagnosing errors in R can be a Kafka-esque adventure and you have to choose your battles. I decided to look elsewhere. An easy way to do that is to handle it manually if you only have two vectors. Just replicate each, order one of them and bind the results together, sort of like this:
var1 = rep(a, length(b)) var1 = var1[order(var1)] var2 = rep(b, length(a)) df = data.frame(a = var1, b = var2)
The ordering step is necessary so that all combinations are represented. So, this is fine for two variables, but won’t work for three or more. Extension of the idea above is straightforward. After two variables, you have a matrix and you simply need to replicate it, just as you would a vector. I coded a function that would take two arguments. The first is a matrix (or a vector) and the second is the next vector we want to reflect.
CartProduct = function(CurrentMatrix, NewElement) { if (length(dim(NewElement)) != 0 ) { warning("New vector has more than one dimension.") return (NULL) } if (length(dim(CurrentMatrix)) == 0) { CurrentRows = length(CurrentMatrix) CurrentMatrix = as.matrix(CurrentMatrix, nrow = CurrentRows, ncol = 1) } else { CurrentRows = nrow(CurrentMatrix) } var1 = replicate(length(NewElement), CurrentMatrix, simplify=F) var1 = do.call("rbind", var1) var2 = rep(NewElement, CurrentRows) var2 = matrix(var2[order(var2)], nrow = length(var2), ncol = 1) CartProduct = cbind(var1, var2) return (CartProduct) }
Note that using rep or replicate with a character matrix may not give you the results you intended. rep converts a matrix into a one-dimensional array. So, I coerce results into matrices and replicate using a list structure, rather than the simplified result from replicate.
So. Nested loops. At this point, it’s easy.
someFunction = function(a, b, c) { aList = list(a = toupper(a), b = tolower(b), c = c) return (aList) } mojo = CartProduct(a, b) mojo = CartProduct(mojo,c) aList = mapply(someFunction, mojo[,1], mojo[,2], mojo[,3], SIMPLIFY = F)
Compare this with the following:
for (a in 1:length(a)) { for (b in 1:length(b)) { for (c in 1:length(c)) { aListElement = someFunction(a, b, c) } } }
Ugh. Note that you can’t do things like check for critical values or whatnot. But for execution over many categories this will spare me a bit of sanity.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.