Highlighting text in text mining

[This article was first published on rOpenSci Blog - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

rplos is an R package to facilitate easy search and full-text retrieval from all Public Library of Science (PLOS) articles, and we have a little feature which aren't sure if is useful or not. I don't actually do any text-mining for my research, so perhaps text-mining folks can give some feedback.

You can quickly get a lot of results back using rplos, so perhaps it is useful to quickly browse what you got. What better tool than a browser to browse? Enter highplos and highbrow. highplos uses the Solr capabilities of the PLOS search API, and lets you get back a string with the term you searched for highlighted (by default with <em> tag for italics).

Installation

install.packages("devtools")
library(devtools)
install_github("rplos", "ropensci")

library(rplos)

Search PLOS articles

out <- highplos(q = "alcohol", hl.fl = "abstract", hl.snippets = 5, limit = 10)
out[[1]]

## $abstract
## [1] "Background: <em>Alcohol</em> consumption causes an estimated 4% of the global disease burden, prompting"               
## [2] " goverments to impose regulations to mitigate the adverse effects of <em>alcohol</em>. To assist public health leaders"
## [3] " and policymakers, the authors developed a composite indicator—the <em>Alcohol</em> Policy Index—to gauge the strength"
## [4] " of a country's <em>alcohol</em> control policies. Methods and Findings: The Index generates a score based on policies"
## [5] " from five regulatory domains—physical availability of <em>alcohol</em>, drinking context, <em>alcohol</em> prices"

Preview results in your browser

The new function highbrow (*snickers quietly*) automagically creates an easy to digest html page, and opens in your default browser.

highbrow(out)

Here's a screenshot similar to what you should see after the last command

highbrow uses the whisker package to fill in a template for a bootstrap html page to make a somewhat pleasing interface to look at your data. In addition, the DOIs are wrapped in a <a> tag with a http://dx.doi.org/ prefix so that you can go directly to the paper if you are so inclined. Also note that the <em> tags (italicized) are replaced with <strong> tags (bold) to make the search term pop out from the screen more.


Let us know what you think.

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)