I am pleased to announce that tidytext 0.1.4 is now on CRAN!
This release of our package for text mining using tidy data principles has an excellent collection of delightfulness in it. First off, all the important functions in tidytext now support support non-standard evaluation through the tidyeval framework.
library(janeaustenr) library(tidytext) library(dplyr) input_var <- quo(text) output_var <- quo(word) data_frame(text = prideprejudice) %>% unnest_tokens(!! output_var, !! input_var)
## # A tibble: 122,204 x 1 ## word ## <chr> ## 1 pride ## 2 and ## 3 prejudice ## 4 by ## 5 jane ## 6 austen ## 7 chapter ## 8 1 ## 9 it ## 10 is ## # ... with 122,194 more rows
I have found the tidyeval framework useful already in my day job when writing functions using dplyr for complex data analysis tasks, so we are glad to have this support in tidytext. The older underscored functions (like
unnest_tokens_()) that took only strings as arguments are still in the package for now, but tidyeval is the way to go, everybody!
Our book website of course contains a lot of information about how to use tidytext, but the pkgdown site has a bit of a different focus in that you can explicitly see all the function documentation and such. Getting this site up and running went extremely smoothly, and I have not worked hard to customize it; this is just all the defaults. In my experience here, the relative bang for one’s buck in setting up a pkgdown site is extremely good.
Another exciting addition to this release of tidytext are tidiers and support for Structural Topic Models from the stm package using tidy data principles. I am becoming a real fan of this implementation of topic modeling in R after experimenting with it for a while (no rJava! so fast!) and soon I’ll have a complete code-through with some example text, The Adventures of Sherlock Holmes.