Text Mining with R: A Tidy Approach

Posted on May 19, 2017 by Mauricio Vargas S. 帕夏 in R bloggers | 0 Comments

[This article was first published on Pachá (Batteries Included), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

About the book

This book applies tidy data principles to text analysis. The aim is to present tools to make many text mining tasks easier, more effective, and consistent with tools already in use, and in particular it presents the tidytext R package.

I love this ebook, at the moment you can read the chapter at the book’s website, and I want it to be soon available on Amazon to have a paperback copy.

The authors of this beautiful exposition of methodology and coding are Julia Silge and David Robinson. Kudos to both of them. In particular, I’ve been following Julia’s blog posts in the last two years and using it as a reference to teach R in my courses.

List of chapters:

The tidy text format
Sentiment analysis with tidy data
Analyzing word and document frequency: tf-idf
Relationships between words: n-grams and correlations
Converting to and from non-tidy formats
Topic modeling
Case study: comparing Twitter archives
Case study: mining NASA metadata
Case study: analyzing usenet text
References

Remarkable contributions of this book

In my opinion chapter 5 is one of the best expositions of data structures in R. By using modern R packages such as dplyr and tidytext, among other packages, the authors move between tibble, DocumentTermMatrix and VCorpus, while they present a set of good practises in R and do include ggplot2 charts to make concepts such as sentiment analysis clear.

If you often hear colleagues saying that R syntax is awkward, show this material to them. Probably people who used R 5 years ago or more, and haven’t used it in a while, will be amazed to see how the %>% operator is used here.

Text analysis requires working with a variety of tools, many of which have inputs and outputs that aren’t in a tidy form. What the authors present here is a noble and remarkable piece of work.

To leave a comment for the author, please follow the link and comment on their blog: Pachá (Batteries Included).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Text Mining with R: A Tidy Approach

About the book

Table of contents

Remarkable contributions of this book

Related

About the book

Table of contents

Remarkable contributions of this book

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)