Query Pandas DataFrame with SQL

[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Similar to SQLDF package providing a seamless interface between SQL statement and R data.frame, PANDASQL allows python users to use SQL querying Pandas DataFrames.

Below are some examples showing how to use PANDASQL to do SELECT / AGGREGATE / JOIN operations. More information is also available on the GitHub (https://github.com/yhat/pandasql).

In [1]: import sas7bdat as sas

In [2]: import pandas as pd

In [3]: import pandasql as pdsql

In [4]: data = sas.SAS7BDAT("accepts.sas7bdat")

In [5]: df = data.toDataFrame()

In [6]: pysql = lambda q: pdsql.sqldf(q, globals())

In [7]: ### SELECT ###

In [8]: str1 = "select bureau_score, ltv from df where bureau_score < 600 and ltv > 100 limit 3;"

In [9]: df1 = pysql(str1)

In [10]: df1
Out[10]: 
   bureau_score  ltv
0           590  103
1           575  120
2           538  113

In [11]: ### AGGREGATE ###

In [12]: str2 = "select ltv, min(bureau_score) as min_score, max(bureau_score) as max_score from df group by ltv order by ltv DESC;"

In [13]: df2 = pysql(str2);

In [14]: df2.head(3)
Out[14]: 
   ltv  min_score  max_score
0  176        709        709
1  168        723        723
2  167        688        688

In [15]: ### JOIN ###

In [16]: str3 = "select b.*, a.bureau_score from df a inner join df2 b on a.ltv = b.ltv order by ltv DESC;"

In [17]: df3 = pysql(str3)

In [18]: df3.head(3)
Out[18]: 
   ltv  min_score  max_score  bureau_score
0  176        709        709           709
1  168        723        723           723
2  167        688        688           688

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)