Approximate string matching in R
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
I have released a new version of the stringdist package.
Besides a some new string distance algorithms it now contains two convenient matching functions:

amatch
: Equivalent to R’smatch
function but allowing for approximate matching. 
ain
: Similar to R’s%in%
operator
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  # here's an example of amatch > x < c('foo', 'bar') > amatch('fu',x,maxDist=2) [1] 1 # if we decrease the maximum allowd distance, we get > amatch('fu',x,maxDist=1) [1] NA # just like with 'match' you can control the output of nomatches: > amatch('fu',x,maxDist=1,nomatch=0) [1] 0 # to see if 'fu' matches approximately with any element of x: ain('fu',x) [1] FALSE # however, if we allow for larger distances ain('fu',x,maxDist=2) [1] TRUE 
Check the helpfile of for other options, like how to choose the string distance algorithm.
Note previously stringdist
and stringdistmatrix
returned 1
if a distance was undefined or exceeding a predefined maximum. Now,
these functions return Inf
in such cases, making it easier to do comparisons. It may break your code if you explicitly test output for this.
With the latest release also arrive the latest bugs, so please drop me a line if you happen to stumble upon one.
The next release will probably not include any userfacing changes, but I’m planning to improve performance by smarter memory allocation and better maxDist
handling for some of the string distance algorithms.
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.