Query Pandas DataFrame with SQL

November 1, 2014
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

Similar to SQLDF package providing a seamless interface between SQL statement and R data.frame, PANDASQL allows python users to use SQL querying Pandas DataFrames.

Below are some examples showing how to use PANDASQL to do SELECT / AGGREGATE / JOIN operations. More information is also available on the GitHub (https://github.com/yhat/pandasql).

In [1]: import sas7bdat as sas

In [2]: import pandas as pd

In [3]: import pandasql as pdsql

In [4]: data = sas.SAS7BDAT("accepts.sas7bdat")

In [5]: df = data.toDataFrame()

In [6]: pysql = lambda q: pdsql.sqldf(q, globals())

In [7]: ### SELECT ###

In [8]: str1 = "select bureau_score, ltv from df where bureau_score < 600 and ltv > 100 limit 3;"

In [9]: df1 = pysql(str1)

In [10]: df1
Out[10]: 
   bureau_score  ltv
0           590  103
1           575  120
2           538  113

In [11]: ### AGGREGATE ###

In [12]: str2 = "select ltv, min(bureau_score) as min_score, max(bureau_score) as max_score from df group by ltv order by ltv DESC;"

In [13]: df2 = pysql(str2);

In [14]: df2.head(3)
Out[14]: 
   ltv  min_score  max_score
0  176        709        709
1  168        723        723
2  167        688        688

In [15]: ### JOIN ###

In [16]: str3 = "select b.*, a.bureau_score from df a inner join df2 b on a.ltv = b.ltv order by ltv DESC;"

In [17]: df3 = pysql(str3)

In [18]: df3.head(3)
Out[18]: 
   ltv  min_score  max_score  bureau_score
0  176        709        709           709
1  168        723        723           723
2  167        688        688           688

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)