Fighting Factors with Cats: Exercises

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


In this exercise set, we will practice using the forcats factor manipulation package by Hadley Wickham. In the last exercise set, we saw that it is entirely possible to deal with factors in base R,  but also that things can get a bit involved and un-intuitive. Forcats simplifies many common factor manipulation tasks and worth mastering if you cannot avoid using factors in your work. Also, studying the package and its source code can give you ideas for writing your own custom function to simplify everyday tasks that you think can be dealt with in a better way.

Solutions are available here.

Exercise 1

Load the gapminder data-set from the gapminder package, as well as forcats. Check what the levels of the continent factor variable are and their frequency in the data.

Exercise 2

Notice that one continent, Antarctica, is missing – add it as the last level of six.

Exercise 3

Actually, you change your mind. There is no permanent human population on Antarctica. Drop this (unused) level from your factor.

Exercise 4

Again, modify the continent factor, making it more precise. Add two new levels: instead of Americas, add North America and South America. The countries in the following vector should be classified as South America and the rest as North America.

c("Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador",

"Paraguay", "Peru", "Uruguay", "Venezuela")

Exercise 5

Arrange the levels of the continent factor in alphabetical order.

Exercise 6

Re-order the continent levels again so that they appear in order of total population in 2007.

Exercise 7

Reverse the order of the factors.

Exercise 8

Make continent, again, an unordered factor. Set North America as the first level, therefore interpreted as a reference group in modeling functions such as lm().

Exercise 9

Turn the following messy vector into a factor with two levels: “Female” and “Male” using the factor function. Use the labels argument in the factor() function.
gender <- c("f", "m ", "male ","male", "female", "FEMALE", "Male", "f", "m")

Exercise 10

Gender can be considered sensitive data. Convert the gender variable into a factor that takes the integer values “1” and “2”, where one integer represents female and the other male, but make the choice randomly.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)