R is primarily a language for working with numbers, but we often need to work with text as well. Whether it’s formatting text for reports, or analyzing natural language data, R provides a number of facilities for working with character data. Handling Strings with R, a free (CC-BY-NC-SA) e-book by UC Berkeley’s Gaston Sanchez, provides an overview of the ways you can manipulate characters and strings with R.

There are many useful sections in the book, but a few selections include:

- C-style formatting — very useful for preparing tabular data for reports
- String manipulation with the stringr package — which provides some welcome consistency in handling strings with R
- Regular expressions — the savior and/or curse for many data extraction problem

Note that the book does *not* cover analysis of natural language data, for which you might want to check out the CRAN Task View on Natural Language Processing or the book Text Mining with R: A Tidy Approach. It’s also sadly silent on the topic of character encoding in R, a topic that often causes problems when dealing with text data, especially from international sources. Nonetheless, the book is a really useful overview of working with text in R, and has been updated extensively since it was last published in 2014. You can read *Handling Strings with R* at the link below.

Gaston Sanchez: Handling Strings with R

