Highlighting text in text mining

December 2, 2013
By

(This article was first published on rOpenSci Blog - R, and kindly contributed to R-bloggers)

rplos is an R package to facilitate easy search and full-text retrieval from all Public Library of Science (PLOS) articles, and we have a little feature which aren't sure if is useful or not. I don't actually do any text-mining for my research, so perhaps text-mining folks can give some feedback.

You can quickly get a lot of results back using rplos, so perhaps it is useful to quickly browse what you got. What better tool than a browser to browse? Enter highplos and highbrow. highplos uses the Solr capabilities of the PLOS search API, and lets you get back a string with the term you searched for highlighted (by default with tag for italics).

Installation

install.packages("devtools")
library(devtools)
install_github("rplos", "ropensci")
library(rplos)

Search PLOS articles

out <- highplos(q = "alcohol", hl.fl = "abstract", hl.snippets = 5, limit = 10)
out[[1]]
## $abstract
## [1] "Background: Alcohol consumption causes an estimated 4% of the global disease burden, prompting"               
## [2] " goverments to impose regulations to mitigate the adverse effects of alcohol. To assist public health leaders"
## [3] " and policymakers, the authors developed a composite indicator—the Alcohol Policy Index—to gauge the strength"
## [4] " of a country's alcohol control policies. Methods and Findings: The Index generates a score based on policies"
## [5] " from five regulatory domains—physical availability of alcohol, drinking context, alcohol prices"

Preview results in your browser

The new function highbrow (*snickers quietly*) automagically creates an easy to digest html page, and opens in your default browser.

highbrow(out)

Here's a screenshot similar to what you should see after the last command

highbrow uses the whisker package to fill in a template for a bootstrap html page to make a somewhat pleasing interface to look at your data. In addition, the DOIs are wrapped in a tag with a http://dx.doi.org/ prefix so that you can go directly to the paper if you are so inclined. Also note that the tags (italicized) are replaced with tags (bold) to make the search term pop out from the screen more.


Let us know what you think.

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)