Quick Hit: which() and match() are not the same

April 11, 2012
By

(This article was first published on Gage Theory » R, and kindly contributed to R-bloggers)

What’s the difference between using which() and match() in R? For me - about 10 hours!

Today I was doing some string matching in R. In my experience performing any sort of regex or string manipulation in R is a mistake. I’ve never performed benchmarks, but it always seems slower than Perl or Python.

That said, when I’m working in R I’m loathe to switch gears to another language unless the switch isn’t optional. While trying to find a match for a string in a vector of characters I thought I had run into one of those situations. My functions had an estimated run time of 10 hours. Too slow! What was slowing it down?

I tend to default for which() for matching in R because it returns multiple matches. In this particular scenario though, I only needed the 1st match in the sequence and the match() function was perfectly fine.

How much of a difference did it make?

  • which() – about 30 seconds per record
  • match() – about 0.01 seconds per record

The speedup went well beyond the reduction in time for limiting the search to the first match. The underlying implementation of match() is clearly much better than which().

Moral of the story? Note to self: do not use which() when match() will do.

To leave a comment for the author, please follow the link and comment on his blog: Gage Theory » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.