Basics of data.table: Smooth data exploration

August 23, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

The data.table package provides perhaps the fastest way for data wrangling in R. The syntax is concise and is made to resemble SQL. After studying the basics of data.table and finishing this exercise set successfully you will be able to start easing into using data.table for all your data manipulation needs.

We will use data drawn from the 1980 US Census on married women aged 21–35 with two or more children. The data includes gender of first and second child, as well as information on whether the woman had more than two children, race, age and number of weeks worked in 1979. For more information please refer to the reference manual for the package AER.

Answers are available here.

Exercise 1
Load the data.table package. Furtermore (install and) load the AER package and run the command data("Fertility") which loads the dataset Fertility to your workspace. Turn it into a data.table object.

Exercise 2
Select rows 35 to 50 and print to console its age and work entry.

Exercise 3
Select the last row in the dataset and print to console.

Exercise 4
Count how many women proceeded to have a third child.

Learn more about the data.table package in the online course R Data Pre-Processing & Data Management – Shape your Data!. In this course you will learn how to

  • work with different data manipulation packages,
  • know how to import, transform and prepare your dataset for modelling,
  • and much more.

Exercise 5
There are four possible gender combinations for the first two children. Which is the most common? Use the by argument.

Exercise 6
By racial composition what is the proportion of woman working four weeks or less in 1979?

Exercise 7
Use %between% to get a subset of woman between 22 and 24 calculate the proportion who had a boy as their firstborn.

Exercise 8
Add a new column, age squared, to the dataset.

Exercise 9
Out of all the racial composition in the dataset which had the lowest proportion of boys for their firstborn. With the same command display the number of observation in each category as well.

Exercise 10
Calculate the proportion of women who have a third child by gender combination of the first two children?

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)