Running SQL Queries in R With the SQLDF Package

October 16, 2011
By

(This article was first published on Abraham Mathew » R, and kindly contributed to R-bloggers)

 

The sqldf package can be used to run sql queries on R data frames. The user simply needs to specify a sql statement enclosed by quotation marks within the sqldf() function. In the follow R code, you see various ways of using the sqldf package to run sql queries on R data frames. The sql command COUNT() is used to find the total number of rows that meet a certain condition. Furthermore, GROUP BY() can be used to structure the data in accordance with the levels of a particular variable.

dte = seq(as.Date("2011-05-01"), as.Date("2011-05-20"), by=1)
persid = c(1013,1011,1014,1015,1023,1028,1012,1018,1019,1020,1027,
        1016,1022,1017,1021,1024,1030,1025,1026,1029)
v1 = round(rnorm(20), 2)
v2 = round(rnorm(20), 2)
first=c("David","Sara","Jon","Jennifer","Ken","Ralph","Chris","David",
      "David","Joe","Melanie","Debbie","Jessica","Ally","Amy","Ralph",
      "Sara","Jane","John","Lance")
last=c("Smith","Jones","Alberts","Hudson","Jennings","Masterson","Browm",
      "Felt","Spade","Montana","Keith","Hardson","Karson","Roberts","Smith",
      "Jennings","Denver","Hudson","Reynolds","Darder")
stat = c("CA","IA","NC","FL","GA","OH","NY","CA","TX","TX","CA","CA","AZ",
      "CO","OK","MI","WI","SC","VT","IL")

df1 <- data.frame(id=c(seq(1,20)), date=c(dte), var1=c(v1),
            var2=c(v2), personid=c(persid))

df2 <- data.frame(id=c(sort(persid)), firstname=c(first),
            lastname=c(last), state=c(stat))
library(sqldf)

sqldf("SELECT COUNT(*) FROM df2 WHERE state = 'CA'")

sqldf("SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1
INNER JOIN df2 ON df1.personid = df2.id WHERE df2.state = 'TX'")
sqldf("SELECT df2.state, COUNT(df1.var1) FROM df1
INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0
GROUP BY df2.state")
sqldf("SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1
INNER JOIN df2 ON df1.personid = df2.id
WHERE df1.date BETWEEN '2011-05-03' AND '2011-05-11'")

 

To leave a comment for the author, please follow the link and comment on his blog: Abraham Mathew » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.