Merging Dataframes Exercises

April 14, 2016
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

Merge_Vertical_2_BlueWhen combining separate dataframes, (in the R programming language), into a single dataframe, using the cbind() function usually requires use of the “Match()” function. To simulate the database joining functionality in SQL, the “Merge()” function in R accomplishes dataframe merging with the following protocols;

“Inner Join” where the left table has matching rows from one, or more, key variables from the right table.
“Outer Join” where all the rows from both tables are joined.
“Left Join” where all rows from the left table, and any rows with matching keys from the right table are returned.
“Right Join” where all rows from the right table, and any rows with matching keys from the left table are returned.

Answers to the exercises are available here.

Exercise 1
Create the dataframes to merge:
buildings <- data.frame(location=c(1, 2, 3), name=c("building1", "building2", "building3"))

data <- data.frame(survey=c(1,1,1,2,2,2), location=c(1,2,3,2,3,1),
efficiency=c(51,64,70,71,80,58))

The dataframes, buildings and data have a common key variable called, “location”. Use the merge() function to merge the two dataframes by “location”, into a new dataframe, “buildingStats”.

Exercise 2
Give the dataframes different key variable names:
buildings <- data.frame(location=c(1, 2, 3), name=c("building1", "building2", "building3"))
data <- data.frame(survey=c(1,1,1,2,2,2), LocationID=c(1,2,3,2,3,1),
efficiency=c(51,64,70,71,80,58))

The dataframes, buildings and data now have corresponding variables called, location, and LocationID. Use the merge() function to merge the columns of the two dataframes by the corresponding variables.

Exercise 3
Inner Join:
The R merge() function automatically joins the frames by common variable names. In that case, demonstrate how you would perform the merge in Exercise 1 without specifying the key variable.

Exercise 4
Outer Join:
Merge the two dataframes from Exercise 1. Use the “all=” parameter in the merge() function to return all records from both tables. Also, merge with the key variable, “location”.

Exercise 5
Left Join:
Merge the two dataframes from Exercise 1, and return all rows from the left table. Specify the matching key from Exercise 1.

Exercise 6
Right Join:
Merge the two dataframes from Exercise 1, and return all rows from the right table. Use the matching key from Exercise 1 to return matching rows from the left table.

Exercise 7
Cross Join:
Merge the two dataframes from Exercise 1, into a “Cross Join” with each row of “buildings” matched to each row of “data”. What new column names are created in “buildingStats”?

Exercise 8
Merging Dataframe rows:
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order.

Merge the rows of the following two dataframes:

buildings <- data.frame(location=c(1, 2, 3), name=c("building1", "building2", "building3"))
buildings2 <- data.frame(location=c(5, 4, 6), name=c("building5", "building4", "building6"))

Also, specify a new dataframe, “allBuidings”.

Exercise 9
A new dataframe, buildings3, has variables not found in the previous dataframes

buildings3 <- data.frame(location=c(7, 8, 9),
name=c("building7", "building8", "building9"),
startEfficiency=c(75,87,91))

Create a new buildings3 without the extra variables.

Exercise 10
Instead of deleting the extra variables from buildings3. append the buildings, and buildings2 with the new variable in buildings3, (from Exercise 9). Set the new data in buildings and buildings2 , (from Exercise 8), to NA.

Image: Wikimedia commons.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)