How to Find Unmatched Records in R

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Find Unmatched Records in R appeared first on Data Science Tutorials

How to Find Unmatched Records in R?, To retrieve all rows in one data frame that do not have matching values in another data frame, use R’s anti_join() function from the dplyr package.

The basic syntax used by this function is as follows.

How to Remove Columns from a data frame in R – Data Science Tutorials

anti_join(df1, df2, by='col_name')

The usage of this syntax is demonstrated in the examples that follow.

Example 1: Use anti_join() with One Column

Suppose we have the two R data frames shown below:

Let’s build data frames

df1 <- data.frame(Q1 = c('a', 'b', 'c', 'd', 'e', 'f'),
                  Q2 = c(152, 514, 114, 218, 322, 323))
df2 <- data.frame(Q1 = c('a', 'a', 'a', 'b', 'b', 'b'),
                  Q3 = c(523, 324, 233, 134, 237, 141))

To return all rows in the first data frame that don’t have a matching Q1 in the second data frame, we can use the anti_join() function.

Bind together two data frames by their rows or columns in R (datasciencetut.com)

library(dplyr)

use the ‘Q1’ column to perform anti join

anti_join(df1, df2, by='Q1')
  Q1  Q2
1  c 114
2  d 218
3  e 322
4  f 323

We can see that there are exactly 4 Q1’s from the first data frame that does not have a matching Q1 name in the second data frame.

Example 2: Use anti_join() with Multiple Columns

Suppose we have the two R data frames shown below.

How to Join Data Frames for different column names in R (datasciencetut.com)

Let’s create a data frames

df1 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                  position=c('G', 'G', 'F', 'G', 'F', 'C'),
                  points=c(152, 114, 219, 254, 356, 441))
df2 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                  position=c('G', 'G', 'C', 'G', 'F', 'F'),
                  points=c(142, 214, 319, 133, 517, 422))

All rows in the first data frame that lack a matching team and position in the second data frame can be returned using the anti_join() function:

library(dplyr)

utilizing the columns for “team” and “position,” perform anti _join.

How to Count Distinct Values in R – Data Science Tutorials

anti_join(df1, df2, by=c('team', 'position'))
   team position points
1    A        F    219
2    B        C    441

We can see that there are exactly two records from the first data frame that do not have a matching team name and position in the second data frame.

The post How to Find Unmatched Records in R appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)