R and PostgreSQL – using RPostgreSQL and sqldf

July 1, 2013
By

(This article was first published on R (en) - Analytik dat, and kindly contributed to R-bloggers)

PostgreSQL and R can often be used together for data analysis - PostgreSQL as database engine and R as statistical tool. In this article you will learn how to access data stored in PostgreSQL database and how to write the data back using RPostgreSQL and sqldf packages.

Connect to PostgreSQL Database

RPostgreSQL

To open connection to PostgreSQL database, you use dbConnect() command.  The simplest connection to localhost takes just two lines. Of course, you can connect to any PostgreSQL database by specifying all connections options - database name, host, port, user and password.

# Establish connection to PoststgreSQL using RPostgreSQL
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv)# Simple version (localhost as default)
# Full version of connection seetting
# con <- dbConnect(drv, dbname="dbname",host="host",port=1234,user="user",password="password", )

At the end of you script, do not forget to close the connection:

# Close PostgreSQL connection 
dbDisconnect(con)

sqldf

Connection to PostgreSQL database with sqldf package has to be set in a different way, by setting RPostgreSQL options:

options(sqldf.RPostgreSQL.user ="postgres", 
  sqldf.RPostgreSQL.password ="password",
  sqldf.RPostgreSQL.dbname ="test",
  sqldf.RPostgreSQL.host ="localhost",
  sqldf.RPostgreSQL.port =5432)

Working with the database

I will use the following table for all examples in this article:

 

 create table tmp.test_tbl (a int, b int);
insert into tmp.test_tbl values (1,4);
insert into tmp.test_tbl values (2,5);
insert into tmp.test_tbl values (3,6);

RPostgreSQL

RPostgreSQL has the following main features. Test table existence (note the specific notation for schema identifaction):

dbExistsTable(con, c("tmp","test_tbl"))

Read table from PostgreSQL into R data frame:

myTable <- dbReadTable(con, c("tmp","test_tbl"))
# Equals to myTable <- sqldf("select * from tmp.test_tbl")

 Write results back to PostgreSQL:

# overwrite=TRUE will change both data and table structure
# When row.name=TRUE then column named row.names will be added to the table
dbWriteTable(con, c("tmp","test_tbl_out"), value=myTable,overwrite=TRUE,row.names=FALSE)

Append data to table:

dbWriteTable(con, c("tmp","test_tbl_out"), value=myTable,append=TRUE, row.names=FALSE)

sqldf

With sqldf you can easily execute any SQL command. For example, you can drop table, create table, insert new records:

library("sqldf")
# We can use sqldf() to issue any command, including drop, create, select, insert
sqldf("
/* sql comments can be used*/
drop table if exists tmp.test_tbl;
create table tmp.test_tbl (a int, b int);
insert into tmp.test_tbl values (1,4);
insert into tmp.test_tbl values (2,5);
insert into tmp.test_tbl values (3,6);
")

Sometimes you might need to switch back to SQLite driver. This way, we can manipulate R data frames using pure SQL.

myTable <- sqldf("select a, b, a+b as c from myTable", drv="SQLite")

Other resources

Both RPostgreSQL and sqldf have their project homepage:

All the examples are accessible in this gist

To leave a comment for the author, please follow the link and comment on his blog: R (en) - Analytik dat.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.