Data Manipulation with Data Table -Part 1

June 15, 2017

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the exercises below we cover the some useful features of data.table ,data.table is a library in R for fast manipulation of large data frame .Please see the data.table vignette before trying the solution .This first set is intended for the begineers of data.table package and does not cover set keywords, joins of data.table which will be covered in the next set . Load the data.table library in your r session before starting the exercise
Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Load the iris dataset ,make it a data.table and name it iris_dt ,Print mean of Petal.Length, grouping by first letter of Species from iris_dt .

Exercise 2
Load the diamonds dataset from ggplot2 package as dt (a data.table) ,Find mean price for each group of cut and color .
Exercise 3
Load the diamonds dataset from ggplot2 package as dt . Now group the dataset by price per carat and print top 5 in terms of count per group . Dont use head ,use chaining in data.table to achieve this

Exercise 4
Use the already loaded diamonds dataset and print the last two carat value of each cut .

Exercise 5
In the same data set , find median of the columns x,y,z per cut . Use data.table’s methods to achieve this .

Exercise 6
Load the airquality dataset as data.table, Now I want to find Logarithm of wind rate for each month and for days greater than 15
Exercise 7
In the same data set , for all the odd rows ,update Temp column by adding 10 .

Exercise 8
data.table comes with a powerful feature of updating column by reference as you have seen in the last exercise,Its even possible to update /create multiple columns .Now to test that in the airquality data.table that you have created previously,add 10 to Solar.R ,Wind .

Exercise 9
Now you have a fairly good idea of how easy its to create multiple column ,Its even possible to use delete multiple column using the same idea. In this exercise , use the same airquality data.table that you have created previously from airquality and delete Solar.R,Wind,Temp using a single expression
Exercise 10
Load the airquality dataset as data.table again , I want to create two columns a,b which indicates temp in Celcius and Kelvin scale . Write a expression to achieve same.
Celcius = (Temp-32)*5/9
Kelvin = Celcius+273.15

To leave a comment for the author, please follow the link and comment on their blog: R-exercises. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)