R dataframe merge while keeping orders of row and column

[This article was first published on K & L Fintech Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post makes a useful wrapper R function merge() for left outer join, which preserves the orders of row and column of input x data. It is not a must but useful when we prefer these fixed orders in some case.


Left Outer Join which we try to do


The following figure demonstrates what we are trying to do. The first is to perform left outer join for determining parameters for the corresponding Bucket and Label1, which are, in fact, currency volatility classification and maturity respectively. The second is to apply the original orders of row and column of x data to this merged result.

R dataframe merge while keeping row and column orders


Read data.frame from a String


It is typical to read data from Excel or csv file but for the purpose of quick and easy exercises, a string also can be read as a data.frame by using read.table() function. The following is to read x data.frame.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# LEFT data which has joining key fields
str.x < “Qualifier Bucket Label1
          eur      1     3y
          cny      3     2w
          jpy      2     2w
          jpy      2     5y
          mxn      3     1y
          jpy      2     5y
          eur      1     5y
          eur      1     3y
          eur      1     3y”
< read.table(text=str.x, header = TRUE)
 
 
> x
  Qualifier Bucket Label1
1       eur      1     3y
2       cny      3     2w
3       jpy      2     2w
4       jpy      2     5y
5       mxn      3     1y
6       jpy      2     5y
7       eur      1     5y
8       eur      1     3y
9       eur      1     3y
> 
 
cs


In this way, we can also read and make y_sc, y_dc data.frames. y_sc (y_dc) is a right data with same (different) key field (column) names.

R dataframe merge while keeping row and column orders


Output of merge()


Using merge(), we can get the result of left outer join. There is no problem in this output and this result is generally recommended in a viewpoint of key columns. But it is somewhat inconvenient when we prefer the original orders of rows and columns for some reason because rows and columns of the result from merge() are rearranged .


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# using merge()
merge(x = x, y = y_sc, 
      by.x = c(“Bucket”“Label1”), 
      by.y = c(“Bucket”“Label1”), 
      all.x = TRUE)
 
> 
  Bucket Label1 Qualifier Parameter
1      1     3y       eur        49
2      1     3y       eur        49
3      1     3y       eur        49
4      1     5y       eur        51
5      2     2w       jpy        14
6      2     5y       jpy        20
7      2     5y       jpy        20
8      3     1y       mxn        97
9      3     2w       cny        85
> 
cs


Output with orders of input rows and columns


To preserve orders of the original rows and columns (of x data), we have only to track information of rows and columns of x data. For this purpose, we can make the following simple user-defined function(f_loj_krc()).


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#—————————————————————–
# Function : f_loj_krc
#—————————————————————–
# Left outer join while keeping orders of input rows and columns
# Meaning of input arguments are the same as those of merge() 
#—————————————————————–
f_loj_krc < function(x, y, by.x, by.y) {
 
    # save row id
    x.temp < x; x.temp$temp.id < 1:nrow(x.temp); 
    
    # each column names
    x.cn < colnames(x); y.cn < colnames(y)
    
    # replace column names of y with same names of x
    # to avoid duplicate fields
    for(i in 1:length(by.y)) {
        colnames(y)[which(y.cn == by.y[i])] < by.x[i]
    }
    by.y < by.x # since two fields are the same now
    
    # new column names of y
    y.cn < colnames(y)
    
    # remove only joining key fields which are redundant
    # and keep only new informative fields
    y.cn.not.key < setdiff(y.cn, by.y)
    
    # left outer join
    df < merge(x = x.temp, y = y, by.x=by.x, by.y=by.y, all.x = TRUE)
    
    # recover the original rows and columns orders
    df < df[order(df$temp.id),c(x.cn, y.cn.not.key)]; rownames(df) < NULL
    
    return(df)
}
 
cs


f_loj_krc() function keeps information of x data, uses merge() for left outer join, and recover orders of the x’s row and column. In particular, as x (left) and y (right) data may have either same or different key field names, we can replace key field names of y with those of x in case of two set of names are different. It’s because it is preferable and parsimonious to remove redundant columns in merged output.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# same key fields names between x and y
f_loj_krc(x = x, y = y_sc, 
          by.x=c(“Bucket”“Label1”), 
          by.y=c(“Bucket”“Label1”))
 
>
  Qualifier Bucket Label1 Parameter
1       eur      1     3y        49
2       cny      3     2w        85
3       jpy      2     2w        14
4       jpy      2     5y        20
5       mxn      3     1y        97
6       jpy      2     5y        20
7       eur      1     5y        51
8       eur      1     3y        49
9       eur      1     3y        49
> 
 
 
# different key fields names between x and y
f_loj_krc(x = x, y = y_dc, 
          by.x = c(“Bucket”“Label1”),
          by.y = c(“Bkt”“Lab1”))
 
>
  Qualifier Bucket Label1 Parameter
1       eur      1     3y        49
2       cny      3     2w        85
3       jpy      2     2w        14
4       jpy      2     5y        20
5       mxn      3     1y        97
6       jpy      2     5y        20
7       eur      1     5y        51
8       eur      1     3y        49
9       eur      1     3y        49
> 
cs


In some case, it is convenient to keep orders of row and column until calculation is completed. But merge() function for joining data.frames delivers rearranged output with respect to key fields. To sidestep this rearranging effect, we make a user-defined function based on merge(), which preserves information of input x data. \(\blacksquare\)

To leave a comment for the author, please follow the link and comment on their blog: K & L Fintech Modeling.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)