How to Join Data Frames for different column names in R

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Join Data Frames for different column names in R appeared first on Data Science Tutorials

How to Join Data Frames for different column names in R?. Using dplyr, you can connect data frames in R based on multiple columns using the following basic syntax.

Data Science Statistics Jobs  » Are you looking for Data Science Jobs?

library(dplyr)
left_join(df1, df2, by=c('x1'='x2', 'y1'='y2'))

Where the following conditions are true, this syntax will perform a left join:

Df1’s x1 column corresponds to df2’s x2 column.

Df1’s y1 column corresponds to df2’s y2 column.

This syntax is demonstrated in the following example.

Checking Missing Values in R – Data Science Tutorials

Using Multiple Columns as an Example dplyr is a Python package that allows you to do a lot of things.

Assume the following two data frames are available in R:

Let’s define first data frame

df1<-data.frame(team=c('A', 'A', 'B', 'B'),
                 pos=c('X', 'F', 'F', 'X'),
                 points=c(128, 222, 129, 124))
df1
   team pos points
1    A   X    128
2    A   F    222
3    B   F    129
4    B   X    124

Now we can define the second data frame.

How to make a rounded corner bar plot in R? – Data Science Tutorials

df2<- data.frame(team_name=c('A', 'A', 'B', 'C', 'C'),
                 position=c('X', 'X', 'F', 'G', 'F'),
                 assists=c(224, 229, 428, 466, 525))
df2
   team_name position assists
1         A        X     224
2         A        X     229
3         B        F     428
4         C        G     466
5         C        F     525

To do a left join based on two columns, we can use the following dplyr syntax.

library(dplyr)

Let’s perform left join based on multiple columns

df3 <- left_join(df1, df2, by=c('team'='team_name', 'pos'='position'))

now we can view the result

df3
   team pos points assists
1    A   X    128     224
2    A   X    128     229
3    A   F    222      NA
4    B   F    129     428
5    B   X    124      NA

The resulting data frame comprises all of the rows from df1 as well as only the rows from df2 when the team and position values were identical.

Test for Normal Distribution in R-Quick Guide – Data Science Tutorials

Also, if the two data frames have identical column names, you can join multiple columns with the following syntax.

library(dplyr)
df3 <- left_join(df1, df2, by=c('team', 'position'))

The post How to Join Data Frames for different column names in R appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)