Supervised Machine Learning for Text Analysis in R

[This article was first published on rstats | Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today, Emil Hvitfeldt and I led a useR! 2020 online tutorial on predictive modeling with text using tidy data principles. This tutorial was hosted by R-Ladies en Argentina; huge thanks to the organizers for their leadership and effort in making this tutorial possible.

tutorial flyer

Materials for this tutorial are available on GitHub, with two main resources in the repo:

If you start working through these materials and get stuck, you can post on RStudio Community or post a question as an issue on the repo. Our goal in designing this tutorial was to create resources for async learning.

The content for this tutorial is largely based on a new project that Emil and I are working on, which we are thrilled to publicly announce as of today: our book Supervised Machine Learning for Text Analysis in R to be published in the Chapman & Hall/CRC Data Science Series!

oh yeah

That title is a bit of a mouthful, so we like to call our project SMLTAR, which is also the URL where you can and will always be able to find the online version of this book. We invite you to take a look at the work we’ve done already, and explore how unstructured text data can be used for supervised predictive models. The book is divided into three sections.

  • Natural language features: How do we transform text data into a representation useful for modeling? In these chapters, we explore the most common preprocessing steps for text, when they are helpful, and when they are not. This section is in good shape already!

  • Machine learning methods: We investigate the power of some of the simpler and more lightweight models in our toolbox. We drew from these chapters in our useR tutorial.

  • Deep learning methods: Given more time and resources, we see what is possible once we turn to neural networks. This section is still to come.

Already, we have so many people to thank for their contributions and support, including our Chapman & Hall editor John Kimmel, the helpful technical reviewers, and Desirée De Leon for the site design of the book’s website. We hope you get a chance to check out this project!

To leave a comment for the author, please follow the link and comment on their blog: rstats | Julia Silge. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)