Making regex examples work for you!

August 30, 2013
By

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)

One of the most frequently used string recognition algorithms out there is regex and R implements regex.  However, users can often be frustrated with how despite taking examples verbatim from many sources such as stackoverflow they do not seem to work.  From my own experience, I have found that the largest issue is really about what characters need to be escaped from R.

For example:

Listing all files whose names match a simple pattern.

Looking at "/^.*icon.*\.png$/i" from

http://stackoverflow.com/questions/4845125/regex-to-match-filename-containing-a-word-regardless-of-case

I was able to get "^.*icon.*.png$" to work in R though I lost the case insensitivity.  I think including the "^." ensures that only files in the current directory, not subdirectory are matched but I am not sure.

So, the following code will return a list of file names from the folder Clipart which match the pattern [anything]icon.png

list.files("C:/Clipart/", pattern="^.*icon.*.png$")
[1] "manicon.png"     "handicon.png"     "bookicon.png"

Looking at the original entry we can see that what was causing us problems was the attempt to escape the "^" which does not need to be escaped in R.

Before looking at another example lets modify the previous command slightly to show how we can make it match differently.

list.files("C:/Clipart/", pattern="^.*icon*.*.png$")
[1] "manicon.png"     "handicon.png"     "bookicon.png"    "iconnew.png"    

There are a lot of resources available for regex since it is really its own text matching language supported by many different programming languages.  A good introductory guide can be found:
http://www.zytrax.com/tech/web/regex.htm

or

http://www.regular-expressions.info/tutorial.html


To leave a comment for the author, please follow the link and comment on his blog: Econometrics by Simulation.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.