**Econometric Sense**, and kindly contributed to R-bloggers)

I recently undertook a project where a colleague had about 12 .csv files that they wanted to merge. Each file had a common (key) variable ‘Partner’ (which is trading partner) with differing columns (variables) except for the common key variable. Actually, the first column (Partner) was the same in each data set, with each data set having 211 rows. The remaining columns were unique to each data set. I required a combined data set.

In SAS (PROC SQL) this would be a matter of a series of left joins. Working in the R environment, I executed what was equivalent to left joins via the ‘merge’ function. However this was tedious, only being able to join 2 tables at a time. In SAS I could use the merge function, which would allow me to merge all 12 tables in 1 data step.

Does anyone know of a better way to do this in R, as opposed to my merge statements? (see sample code below)

R Merge Statements:

# -------------------------------------------------------------------

#| merge data sets with R merge function

# ------------------------------------------------------------------

# left join b_dat onto a_dat on variable Partner

ab <- merge(a_dat,b_dat, by.a_dat = Partner, by.b_dat =Partner, all = FALSE, all.x = TRUE, all.y = FALSE)

# left join c_dat onto ab on variable Partner

a_c <- merge(ab,c_dat, by.ab = Partner, by.c_dat =Partner, all = FALSE, all.x = TRUE, all.y = FALSE)

# I have about 10 more data sets to left join with a_c, is there a better way to join these

# in R as opposed to pairwise merges like above?

I found some help on r-wiki, it merges all 3 data sets at once, but gives me extra redundant columns with .x and .y appended to their names. I’m not sure about these results.

SAS – similar code for 3 data sets DAT_A, DAT_B, DAT_C

DATA ALL_COUNTRIES;

MERGE DAT_A (IN=A) DAT_B (IN=B)DAT_C (IN=C);

BY PARTNER;

IF B AND C; /* Actually its been so long since */

RUN; /* I have used a data step vs PROC SQL I’m not sure this*/

/* statement gives me a left join, but I’m not sure I can */

/* do any better than 2 at a time left joins in PROC SQL */

/* so that would not be any better than my R code above */

**leave a comment**for the author, please follow the link and comment on their blog:

**Econometric Sense**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...