What’s the difference between using which() and match() in R? For me – about 10 hours!
Today I was doing some string matching in R. In my experience performing any sort of regex or string manipulation in R is a mistake. I’ve never performed benchmarks, but it always seems slower than Perl or Python.
That said, when I’m working in R I’m loathe to switch gears to another language unless the switch isn’t optional. While trying to find a match for a string in a vector of characters I thought I had run into one of those situations. My functions had an estimated run time of 10 hours. Too slow! What was slowing it down?
I tend to default for which() for matching in R because it returns multiple matches. In this particular scenario though, I only needed the 1st match in the sequence and the match() function was perfectly fine.
How much of a difference did it make?
- which() – about 30 seconds per record
- match() – about 0.01 seconds per record
The speedup went well beyond the reduction in time for limiting the search to the first match. The underlying implementation of match() is clearly much better than which().
Moral of the story? Note to self: do not use which() when match() will do.