May 18, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

This set of exercises will help you to help you improve your skills with character functions in R. Most of the exercises are related with text mining, a statistical technique that analyses text using statistics. If you find them interesting I would suggest checking the library `tm`, this includes functions designed for this task. There are many applications of text mining, a pretty popular one is the ability to associate a text with his or her author, this was how J.K.Rowling (Harry potter author) was caught publishing a new novel series under an alias. Before proceeding, it might be helpful to look over the help pages for the `nchar`, `tolower`, `toupper`, `grep`, `sub ` and `strsplit`. Take at the library `stringr` and the functions it includes such as `str_sub`.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Before starting the set of exercises run the following code lines :

```if (!'tm' %in% installed.packages()) install.packages('tm') library(tm) txt = system.file("texts", "txt", package = "tm") ovid = VCorpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat")) OVID = c(data.frame(text=unlist(TEXT), stringsAsFactors = F)) TEXT = lapply(ovid[1:5], as.character) TEXT1 = TEXT[[4]] ```

Exercise 1

Delete all the punctuation marks from TEXT1

Exercise 2

How many letters does TEXT1 contains?

Exercise 3

How many words does TEXT1 contains?

Exercise 4

What is the most common word in TEXT1?

Learn more about Text analysis in the online course Text Analytics/Text Mining Using R. In this course you will learn how create, analyse and finally visualize your text based data source. Having all the steps easily outlined will be a great reference source for future work.

Exercise 5

Get an object that contains all the words with at least one capital letter (Make sure the object contains each word only once)

Exercise 6

Which are the 5 most common letter in the object `OVID`?

Exercise 7

Which letters from the alphabet are not in the object `OVID`

Exercise 8

On the `OVID` object, there is a character from the popular sitcom ‘FRIENDS’ , Who is he/she?  There were six main characters (Chandler, Phoebe, Ross, Monica, Joey, Rachel)

Exercise 9

Find the line where this character is mentioned

Exercise 10

How many words finish with a vowel, how many with a consonant?

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...