How to select and merge R data frames with SQL

December 17, 2012
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The R language provides many features in the language for selecting data from data frames: the "[" operator, logical functions, and utility functions like "subset". But if you know SQL (the query language ubiquitous in database systems), none of this is necessary. With the sqldf package, you can just pretend that your data frame is a database, and use SQL directly. 

The sqldf function supports the full richness of the SQL language, but applied to data frames in R's memory. This includes:

  • SELECT ... WHERE statements to select rows and columns according to logical criteria
  • CASE clauses, for queries with special cases
  • ORDER BY statements, to sort the resulting data according to specified columns
  • LEFT JOIN and INNER JOIN statements for merging data frames

The sqldf package uses its own internal database engine, so there's no special database configuration you need to do. Just enter the following in R:

and you should be good to go. The SQLDF FAQ is a good resource for getting started, and this sqldf video tutorial from Keystone Solutions shows the sqldf package in action with examples of SQL queries from simple to complex.

Google code: sqldf: SQL select on R data frames

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.