tidytext 0.1.4

Posted on September 29, 2017 by Rstats on Julia Silge in R bloggers | 0 Comments

[This article was first published on Rstats on Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am pleased to announce that tidytext 0.1.4 is now on CRAN!

This release of our package for text mining using tidy data principles has an excellent collection of delightfulness in it. First off, all the important functions in tidytext now support support non-standard evaluation through the tidyeval framework.

library(janeaustenr)
library(tidytext)
library(dplyr)

input_var <- quo(text)
output_var <- quo(word)

data_frame(text = prideprejudice) %>%
    unnest_tokens(!! output_var, !! input_var)
## # A tibble: 122,204 x 1
##         word
##        <chr>
##  1     pride
##  2       and
##  3 prejudice
##  4        by
##  5      jane
##  6    austen
##  7   chapter
##  8         1
##  9        it
## 10        is
## # ... with 122,194 more rows

I have found the tidyeval framework useful already in my day job when writing functions using dplyr for complex data analysis tasks, so we are glad to have this support in tidytext. The older underscored functions (like unnest_tokens_()) that took only strings as arguments are still in the package for now, but tidyeval is the way to go, everybody!

I also used pkgdown to build a website to explore tidytext’s documentation and vignettes.

Our book website of course contains a lot of information about how to use tidytext, but the pkgdown site has a bit of a different focus in that you can explicitly see all the function documentation and such. Getting this site up and running went extremely smoothly, and I have not worked hard to customize it; this is just all the defaults. In my experience here, the relative bang for one’s buck in setting up a pkgdown site is extremely good.

Another exciting addition to this release of tidytext are tidiers and support for Structural Topic Models from the stm package using tidy data principles. I am becoming a real fan of this implementation of topic modeling in R after experimenting with it for a while (no rJava! so fast!) and soon I’ll have a complete code-through with some example text, The Adventures of Sherlock Holmes.

via GIPHY

There are a few other minor changes and bug fixes in this release as well. Get the new version of tidytext and let us know on GitHub if you have any issues!

To leave a comment for the author, please follow the link and comment on their blog: Rstats on Julia Silge.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

tidytext 0.1.4

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)