stringdist 0.9.6 on CRAN: new features

[This article was first published on R – Mark van der Loo, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

stringdist version 0.9.6 arrived on CRAN on 16 july 2020.

This release brings a few new features.

Fuzzy text search

Search text for approximate matches of a search string using any stringdist distance. There are several functions that allow you to

  • detect whether there is a match within a certain maximum distance
  • return the position of the first best match
  • return the best match.

There are several interfaces for this. Functions grab and grabl work like base grep and grepl. The function extract has output similar to stringr::str_extract. The workhorse function is called afind (approximate find), which returns all results for multiple search patterns.

There is also a new implementation of the popular ‘cosine’ distance that I developed especially for this purpose. It is called ‘running_cosine’ and it avoids double work otherwise done with by the standard ‘cosine’ method. The result is a much faster implementation (up to about 100 times faster).

string similarity matrices

Thanks to a PR by Johannes Gruber stringdist now has a function to compute string similarity matrices: stringsimmatrix

To leave a comment for the author, please follow the link and comment on their blog: R – Mark van der Loo.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)