Exchange Data between Python and R with SQLite

December 2, 2012
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by both Python and R.

In [1]: # LOAD PYTHON PACKAGES

In [2]: import pandas as pd

In [3]: import pandas.io.sql as pd_sql

In [4]: import sqlite3 as sql

In [5]: import pyper as pr

In [6]: # READ DATA

In [7]: py_data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt")

In [8]: print py_data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4421 entries, 0 to 4420
Data columns:
LEV_LT3     4421  non-null values
TAX_NDEB    4421  non-null values
COLLAT1     4421  non-null values
SIZE1       4421  non-null values
PROF2       4421  non-null values
GROWTH2     4421  non-null values
AGE         4421  non-null values
LIQ         4421  non-null values
IND2A       4421  non-null values
IND3A       4421  non-null values
IND4A       4421  non-null values
IND5A       4421  non-null values
dtypes: float64(7), int64(5)

In [9]: # CREATE A CONNECTION TO SQLITE DB

In [10]: con = sql.connect("/home/liuwensui/Documents/data/tmp.db")

In [11]: # WRITE THE DATAFRAME INTO SQLITE DB

In [12]: con.execute("drop table if exists tbldata")
Out[12]: <sqlite3.Cursor at 0xa00d820>

In [13]: pd_sql.write_frame(py_data, "tbldata", con)

In [14]: con.commit()

In [15]: # TEST THE DATA WRITTEN INTO SQLITE DB

In [16]: test_data = pd_sql.read_frame("select * from tbldata limit 5", con)

In [17]: print test_data
   LEV_LT3  TAX_NDEB   COLLAT1      SIZE1     PROF2    GROWTH2  AGE       LIQ  IND2A  IND3A  IND4A  IND5A
0        0  0.530298  0.079172  13.131993  0.082016   1.166493   53  0.385779      0      0      1      0
1        0  0.370025  0.040745  12.132626  0.082615  11.092048   54  0.224123      1      0      0      0
2        0  0.636884  0.307242  13.322921  0.245129  -6.316099   43  0.055441      1      0      0      0
3        0  0.815549  0.295864  16.274536  0.164052   1.394809   24  0.016731      1      0      0      0
4        0  0.097690  0.033567  13.491299  0.160505  10.204010   49  0.387136      1      0      0      0

In [18]: # CREATE A R INSTANCE

In [19]: r = pr.R()

In [20]: # LOAD R LIBRARY

In [21]: print r("library(sqldf)")
try({library(sqldf)})
Loading required package: DBI
Loading required package: gsubfn
Loading required package: proto
Loading required namespace: tcltk
Loading Tcl/Tk interface ... done
Loading required package: chron
Loading required package: RSQLite
Loading required package: RSQLite.extfuns


In [22]: # READ DATA FROM SQLITE DB

In [23]: print r("r_data <- sqldf('select * from tbldata', dbname = '/home/liuwensui/Documents/data/tmp.db')")
try({r_data <- sqldf('select * from tbldata', dbname = '/home/liuwensui/Documents/data/tmp.db')})
Loading required package: tcltk


In [24]: print r("str(r_data)")
try({str(r_data)})
'data.frame':	4421 obs. of  12 variables:
 $ LEV_LT3 : num  0 0 0 0 0 0 0 0 0 0 ...
 $ TAX_NDEB: num  0.5303 0.37 0.6369 0.8155 0.0977 ...
 $ COLLAT1 : num  0.0792 0.0407 0.3072 0.2959 0.0336 ...
 $ SIZE1   : num  13.1 12.1 13.3 16.3 13.5 ...
 $ PROF2   : num  0.082 0.0826 0.2451 0.1641 0.1605 ...
 $ GROWTH2 : num  1.17 11.09 -6.32 1.39 10.2 ...
 $ AGE     : int  53 54 43 24 49 24 35 77 33 81 ...
 $ LIQ     : num  0.3858 0.2241 0.0554 0.0167 0.3871 ...
 $ IND2A   : int  0 1 1 1 1 1 1 1 1 0 ...
 $ IND3A   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ IND4A   : int  1 0 0 0 0 0 0 0 0 0 ...
 $ IND5A   : int  0 0 0 0 0 0 0 0 0 1 ...

To leave a comment for the author, please follow the link and comment on his blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.