Site icon R-bloggers

How to Find Unmatched Records in R

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Find Unmatched Records in R appeared first on Data Science Tutorials

How to Find Unmatched Records in R?, To retrieve all rows in one data frame that do not have matching values in another data frame, use the anti_join() function from the dplyr package in R.

What Is the Best Way to Filter by Date in R? – Data Science Tutorials

The following is the fundamental syntax for this function.

anti_join(df1, df2, by='col_name')

The examples below demonstrate how to utilise this syntax in practise.

How to make a rounded corner bar plot in R? – Data Science Tutorials

Example 1: Use anti join() with One Column

Let’s pretend we have the following two R data frames:

Now we  data frames

df1 <- data.frame(team=c('A', 'B', 'C', 'D', 'E'),
                  points=c(102, 104, 129, 224, 436))
df2 <- data.frame(team=c('A', 'B', 'C', 'F', 'G'),
                  points=c(412, 514, 519, 233, 117))

To return all rows in the first data frame that do not have a matching team in the second data frame, we can use the anti_join() function.

How to get the last value of each group in R – Data Science Tutorials

library(dplyr)

Using the ‘team’ column, execute an anti-join.

anti_join(df1, df2, by='team')
  team points
1    D    224
2    E    436

We can see that in the second data frame, there are exactly two teams from the first data frame that do not have a corresponding team name.

Example 2: Use anti_join() with Multiple Columns

Let’s pretend we have the following two R data frames.

Change ggplot2 Theme Color in R- Data Science Tutorials

Let’s create the data frames

df1 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                  position=c('G', 'G', 'F', 'G', 'F', 'C'),
                  points=c(182, 164, 159, 124, 136, 441))
df2 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                  position=c('G', 'G', 'C', 'G', 'F', 'F'),
                  points=c(152, 154, 159, 322, 217, 522))

The anti_join() method can be used to return all rows in the first data frame that do not match a team or position in the second data frame.

How to perform the Kruskal-Wallis test in R? – Data Science Tutorials

library(dplyr)

Use the ‘team’ and ‘position’ columns to do an anti-join.

anti_join(df1, df2, by=c('team', 'position'))
  team position points
1    A        F    159
2    B        C    441

We can see that in the second data frame, there are exactly two records from the first data frame that do not have a corresponding team name and position.

Check your inbox or spam folder to confirm your subscription.

The post How to Find Unmatched Records in R appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.