Want to join two R data frames on a common key? Here's one way do a SQL database style join operation in R.
We start with a data frame describing probes on a microarray. The key is the probe_id and the rest of the information describes the location on the genome targeted by that probe.
> head(probes)
probe_id sequence strand start end
1 mm_ex_fwd_000541 Chr + 1192448 1192507
2 mm_ex_fwd_000542 Chr + 1192453 1192512
3 mm_ex_fwd_000543 Chr + 1192458 1192517
4 mm_ex_fwd_000544 Chr + 1192463 1192522
5 mm_ex_fwd_000545 Chr + 1192468 1192527
6 mm_ex_fwd_000546 Chr + 1192473 1192532
> dim(probes)
[1] 241019 5
We also have a bunch of measurements in a numeric vector. For each probe (well, a few probes missing due to bad data) we have a value.
> head(value) mm_fwd_000002 mm_fwd_000003 mm_fwd_000004 mm_fwd_000005 mm_fwd_000006 mm_fwd_000007 0.05294899 0.11979251 0.28160017 0.57284569 0.74402510 0.78644199 > length(value) [1] 241007
Let's join up these tables, er data frame and vector. We'll use the match function. Match returns a vector of positions of the (first) matches of its first argument in its second (or NA if there is no match). So, we're matching our values into our probes.
> joined = cbind(probes[match(names(value), probes$probe_id),], value)
> dim(joined)
[1] 241007 6
> head(joined)
probe_id sequence strand start end value
3695 mm_fwd_000002 Chr + 15 74 0.05294899
3696 mm_fwd_000003 Chr + 29 88 0.11979251
3697 mm_fwd_000004 Chr + 43 102 0.28160017
3698 mm_fwd_000005 Chr + 57 116 0.57284569
3699 mm_fwd_000006 Chr + 71 130 0.74402510
3700 mm_fwd_000007 Chr + 85 144 0.78644199
Merge is probably more similar to a database join.
| Inner join | merge(df1, df2, by="common_key_column") |
| Outer join | merge(df1, df2, by="common_key_column", all=TRUE) |
| Left outer | merge(df1, df2, by="common_key_column", all.x=TRUE) |
| Right outer | merge(df1, df2, by="common_key_column", all.y=TRUE) |
If we have two data frames, we can use merge. Let's convert our vector tp to a data frame and merge, getting the same result (in a different sort order).
> tp.df = data.frame(probe_id=names(tp), value=tp)
> head(tp.df)
probe_id value
mm_fwd_000002 mm_fwd_000002 0.05294899
mm_fwd_000003 mm_fwd_000003 0.11979251
mm_fwd_000004 mm_fwd_000004 0.28160017
mm_fwd_000005 mm_fwd_000005 0.57284569
mm_fwd_000006 mm_fwd_000006 0.74402510
mm_fwd_000007 mm_fwd_000007 0.78644199
> m = merge(probes, tp.df, by="probe_id")
> dim(m)
[1] 241007 6
> head(mmm)
probe_id sequence strand start end value
1 mm_ex_fwd_000541 Chr + 1192448 1192507 0.1354668
2 mm_ex_fwd_000542 Chr + 1192453 1192512 0.1942794
3 mm_ex_fwd_000543 Chr + 1192458 1192517 0.1924457
4 mm_ex_fwd_000544 Chr + 1192463 1192522 0.2526351
5 mm_ex_fwd_000545 Chr + 1192468 1192527 0.1922655
6 mm_ex_fwd_000546 Chr + 1192473 1192532 0.2610747
There's a good discussion of merge on Stack Overflow, which includes right, left, inner and outer joins. Also the R wiki covers both match and merge. See also, the prior entry on select operations on R data frames.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).