Running SQL Queries in R With the SQLDF Package

[This article was first published on Abraham Mathew » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

 

The sqldf package can be used to run sql queries on R data frames. The user simply needs to specify a sql statement enclosed by quotation marks within the sqldf() function. In the follow R code, you see various ways of using the sqldf package to run sql queries on R data frames. The sql command COUNT() is used to find the total number of rows that meet a certain condition. Furthermore, GROUP BY() can be used to structure the data in accordance with the levels of a particular variable.

dte = seq(as.Date("2011-05-01"), as.Date("2011-05-20"), by=1)
persid = c(1013,1011,1014,1015,1023,1028,1012,1018,1019,1020,1027,
        1016,1022,1017,1021,1024,1030,1025,1026,1029)
v1 = round(rnorm(20), 2)
v2 = round(rnorm(20), 2)
first=c("David","Sara","Jon","Jennifer","Ken","Ralph","Chris","David",
      "David","Joe","Melanie","Debbie","Jessica","Ally","Amy","Ralph",
      "Sara","Jane","John","Lance")
last=c("Smith","Jones","Alberts","Hudson","Jennings","Masterson","Browm",
      "Felt","Spade","Montana","Keith","Hardson","Karson","Roberts","Smith",
      "Jennings","Denver","Hudson","Reynolds","Darder")
stat = c("CA","IA","NC","FL","GA","OH","NY","CA","TX","TX","CA","CA","AZ",
      "CO","OK","MI","WI","SC","VT","IL")

df1 <- data.frame(id=c(seq(1,20)), date=c(dte), var1=c(v1),
            var2=c(v2), personid=c(persid))

df2 <- data.frame(id=c(sort(persid)), firstname=c(first),
            lastname=c(last), state=c(stat))
library(sqldf)

sqldf("SELECT COUNT(*) FROM df2 WHERE state = 'CA'")

sqldf("SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1
INNER JOIN df2 ON df1.personid = df2.id WHERE df2.state = 'TX'")
sqldf("SELECT df2.state, COUNT(df1.var1) FROM df1
INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0
GROUP BY df2.state")
sqldf("SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1
INNER JOIN df2 ON df1.personid = df2.id
WHERE df1.date BETWEEN '2011-05-03' AND '2011-05-11'")

 

To leave a comment for the author, please follow the link and comment on their blog: Abraham Mathew » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)