Data wrangling : Transforming (3/3)

August 2, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)


Data wrangling is a task of great importance in data analysis. Data wrangling, is the process of importing, cleaning and transforming raw data into actionable information for analysis. It is a time-consuming process which is estimated to take about 60-80% of analyst’s time. In this series we will go through this process. It will be a brief series with goal to craft the reader’s skills on the data wrangling task. This is the third part of the series and it aims to cover the transforming of data used.This can include filtering, summarizing, and ordering your data by different means. This also includes combining various data sets, creating new variables, and many other manipulation tasks. At this post, we will go through a few more advanced transformation tasks on mtcars data set, in particular table manipulation.

Before proceeding, it might be helpful to look over the help pages for the ineer_join, full__join, left_join, right_join, semi_join, anti_join, intersect, union, setdiff, bind_rows.

Moreover please load the following libraries and run the following link.
install.packages("dplyr")
library(dplyr)

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

Create a new object named car_inner containing the observations that have matching values in both tables mtcars and cars_table using as key the variable ID.

Exercise 2

Create a new object named car_left containing all the observations from the left table (mtcars), and the matched records from the right table (cars_table) using as key the variable ID.

Learn more about Data Pre-Processing in the online course R Data Pre-Processing & Data Management – Shape your Data!. In this course you will learn how to:

  • Work with popular libraries such as dplyr
  • Learn about methods such as pipelines
  • And much more

Exercise 3

Create a new object named car_right containing all the observations from the right table (cars_table), and the matched records from the right table (mtcars) using as key the variable ID.

Exercise 4

Create a new object named car_full containing all the observations when there is a match in either left (cars_table) or right (mtcars) table observation using as key the variable ID.

Exercise 5

Create a new object named car_semi containing all the observations from mtcars where there are matching values in cars_table using as key the variable ID.

Exercise 6
Create a new object named car_anti containing all the observations from mtcars where there are not matching values in cars_table using as key the variable ID.

Exercise 7

Create a new object named cars_inter which contains rows that appear in both tables mtcars and cars.

Exercise 8

Create a new object named cars_union which contains rows appear in either tables mtcars and cars.

Exercise 9

Create a new object named cars_diff which contains rows appear in table mtcars and not cars.

Exercise 10

Append mtcars to cars and assign it at the object car_rows.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)