Using Rcpp with Boost.Regex for regular expression
[This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Gabor asked about Rcpp use with regular expression libraries. This post shows a very simple example, based on
one of the Boost.RegEx examples.
We need to set linker options. This can be as simple as
Sys.setenv("PKG_LIBS"="-lboost_regex")
With that, the following example can be built:
// cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp
#include <Rcpp.h>
#include <string>
#include <boost/regex.hpp>
bool validate_card_format(const std::string& s) {
static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
return boost::regex_match(s, e);
}
const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
const std::string machine_format("\\1\\2\\3\\4");
const std::string human_format("\\1-\\2-\\3-\\4");
std::string machine_readable_card_number(const std::string& s) {
return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
}
std::string human_readable_card_number(const std::string& s) {
return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
}
// [[Rcpp::export]]
Rcpp::DataFrame regexDemo(std::vector<std::string> s) {
int n = s.size();
std::vector<bool> valid(n);
std::vector<std::string> machine(n);
std::vector<std::string> human(n);
for (int i=0; i<n; i++) {
valid[i] = validate_card_format(s[i]);
machine[i] = machine_readable_card_number(s[i]);
human[i] = human_readable_card_number(s[i]);
}
return Rcpp::DataFrame::create(Rcpp::Named("input") = s,
Rcpp::Named("valid") = valid,
Rcpp::Named("machine") = machine,
Rcpp::Named("human") = human);
}
We can test the function using the same input as the Boost example:
s <- c("0000111122223333", "0000 1111 2222 3333", "0000-1111-2222-3333", "000-1111-2222-3333")
regexDemo(s)
input valid machine human
1 0000111122223333 FALSE 0000111122223333 0000-1111-2222-3333
2 0000 1111 2222 3333 TRUE 0000111122223333 0000-1111-2222-3333
3 0000-1111-2222-3333 TRUE 0000111122223333 0000-1111-2222-3333
4 000-1111-2222-3333 FALSE 000111122223333 000-1111-2222-3333
To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.