August 30, 2013

One of the most frequently used string recognition algorithms out there is regex and R implements regex.  However, users can often be frustrated with how despite taking examples verbatim from many sources such as stackoverflow they do not seem to work.  From my own experience, I have found that the largest issue is really about what characters need to be escaped from R.

For example:

Listing all files whose names match a simple pattern.

Looking at /^.*icon.*\.png$/i” from
I was able to get ^.*icon.*.png$ to work in R though I lost the case insensitivity.  I think including the “^.” ensures that only files in the current directory, not subdirectory are matched but I am not sure.

So, the following code will return a list of file names from the folder Clipart which match the pattern [anything]icon.png

list.files(“C:/Clipart/”, pattern=”^.*icon.*.png$”)
[1] “manicon.png”     “handicon.png”     “bookicon.png”

Looking at the original entry we can see that what was causing us problems was the attempt to escape the “^” which does not need to be escaped in R.
Before looking at another example lets modify the previous command slightly to show how we can make it match differently.
list.files(“C:/Clipart/”, pattern=”^.*icon*.*.png$”)
[1] “manicon.png”     “handicon.png”     “bookicon.png”    “iconnew.png”    
There are a lot of resources available for regex since it is really its own text matching language supported by many different programming languages.  A good introductory guide can be found:

