Beyond the basics of data.table: Smooth data exploration

September 5, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

This exercise set provides practice using the fast and concise data.table package. If you are new to the syntax it is recommended that you start by solving the set on the basics of data.table before attempting this one.

We will use data on used cars (Toyota Corollas) on sale during 2004 in the Netherlands. There are 1436 observations with information on the price at which it is offered for sale, age, mileage and more, see full variable description here.

Answers are available here.

 

Exercise 1
Load the data available to your working environment using fread(), don’t forget to load the data.table package first.

Exercise 2
Using one line of code print out the most common car model in the data, and the number of times it appears.

Exercise 3
Print out the mean and median price of the 10 most common models.

Exercise 4
Delete all columns that have Guarantee in its name.

Exercise 5
Add a new column which is the squared deviation of price from the average price of cars the same color.

Exercise 6
Use a combintation of .SDcols and lapply to get the mean value of columns 18 through 35

Exercise 7
Print the most common color by age in years?

Learn more about the data.table package in the online course R Data Pre-Processing & Data Management – Shape your Data!. In this course you will learn how to

  • work with different data manipulation packages,
  • know how to import, transform and prepare your dataset for modelling,
  • and much more.

Exercise 8
For the dummy variables in columns 18:35 recode 0 to -1. You might want to use the set function here.

Exercise 9
Use the set function to add “yuck!” to the varible Fuel_Type if it is not petrol. Just because…

Exercise 10
Using .SDcols and one command create two new variables, log of Weight and Price.

 

(Painting by José de Almada)

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)