stringdist 0.9.6 on CRAN: new features
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
stringdist version 0.9.6 arrived on CRAN on 16 july 2020.
This release brings a few new features.
Fuzzy text search
Search text for approximate matches of a search string using any stringdist distance. There are several functions that allow you to
- detect whether there is a match within a certain maximum distance
- return the position of the first best match
- return the best match.
There are several interfaces for this. Functions grab
and grabl
work like base grep
and grepl
. The function extract
has output similar to stringr::str_extract
. The workhorse function is called afind
(approximate find), which returns all results for multiple search patterns.
There is also a new implementation of the popular ‘cosine’ distance that I developed especially for this purpose. It is called ‘running_cosine’ and it avoids double work otherwise done with by the standard ‘cosine’ method. The result is a much faster implementation (up to about 100 times faster).
string similarity matrices
Thanks to a PR by Johannes Gruber stringdist now has a function to compute string similarity matrices: stringsimmatrix
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.