Connecting R to PostgreSQL on Linux
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Connecting to databases is a critical piece of data anlaysis in R. In most analytic roles the data we consume is going to be found in databases. Of these some of the most common are SQL databases like MS SQL Server, PostgreSQL, and Oracle in addition to many others. In this how-to blog, I’ll walk you through the major steps of configuring your machine and R to be able to connect to a PostgreSQL Server database from R on Ubuntu using the RPostgreSQL
, odbc
, and RJDBC
packages in R. Similar steps can be followed to set up connections to other databases, however, driver installation and configuration will likely be slightly different.
1 – RPostgreSQL Package Setup
The first step in setting up a connection to a PostgreSQL database is to first download the PostgreSQL header files and static library, libpq-dev
. In order to do this on Ubuntu open the terminal and install it using the following command:
sudo apt-get install libpq-dev
Once the libpq-dev
package is installed the next step is to install the RPostgreSQL
package in R. If you need to authenticate, I highly recommend the getPass
package which will prompt you for your password. RStudio also has a .rs.askForPassword()
function that works similar to the getPass()
function, but it relies on using RStudio. I’ve confirmed that getPass
works in bash, emacs, RStudio, and when knitting your Rmd files. So however you submit your R code it will work the same.
# Install the package in R install.packages("RPostgreSQL") library(RPostgreSQL) ## Loading required package: DBI library(getPass) pgdrv <- dbDriver(drvName = "PostgreSQL") db <-DBI::dbConnect(pgdrv, dbname="postgres", host="localhost", port=5432, user = 'postgres', password = getPass("Enter Password:")) ## Please enter password in TK window (Alt+Tab) # Write to database DBI::dbWriteTable(db, "mtcars", mtcars) ## [1] TRUE DBI::dbDisconnect(db) ## [1] TRUE
Perfect! Your database connection should be working simply by adding the proper arguments in your dbConnect()
function. You may need to tweak the host, port and user based on your PostgreSQL server setup.
2 - odbc Package Setup
In case you are a fan of odbc
, the next section will walk you through the steps of creating your database connection via odbc
.
In the past I have used the RODBC
package but recently I have found that the odbc
package plays much nicer with other database tools like DBI
and dbplyr
. Plus it has very similar syntax to the RJDBC
package and for consistency sake I’ve made the switch.
Once again, the first step is to install the necessary debian packages. In this case we need to install the unixodbc
and unixodbc-dev
packages and the odbc-postgresql
driver.
# Install the unixODBC library apt-get install unixodbc unixodbc-dev # PostgreSQL ODBC Drivers apt-get install odbc-postgresql
Set up connection with connection string
Okay we are now ready to connect via odbc. Note the slight difference in the names of the arguments of the dbConnect()
function.
db <- DBI::dbConnect(odbc::odbc(), Driver = "PostgreSQL Unicode", Database = "postgres", UserName = "postgres", Password = getPass("Enter Password:"), Servername = "localhost", Port = 5432) ## Please enter password in TK window (Alt+Tab)
Set up connection with DSN
If you don’t want to have to worry about defining each of these arguments each time you connect to PostgreSQL via odbc, you can define the configuration in your odbcinst.ini
file. The following steps walk you through the process:
Make sure the
/etc/odbcinst.ini
has the drivers set up. This should have been configured automatically when installingodbc-postgresql
withapt-get
. This is what it would look like:[PostgreSQL Unicode] Driver = psqlodbca.so Setup = libodbcpsqlS.so Debug = 0 CommLog = 1 UsageCount = 1
Now define your DSN by modifying the odbc.ini file:
[PostgreSQL] Driver = PostgreSQL Unicode Database = postgres Servername = localhost UserName = postgres Password = postgres
Connect to your database by referencing your DSN name specified in the square brackets of the
odbc.ini
file:
# Connect to the database db <- dbConnect(odbc::odbc(), "PostgreSQL") # Pull the Data into an R dataframe DBI::dbGetQuery(db,"SELECT * FROM MTCARS") ## row.names mpg cyl disp hp drat wt qsec vs am gear carb ## 1 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## 2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## 3 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## 4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## 5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## 6 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## 7 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## 8 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## 9 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## 10 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## 11 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## 12 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## 13 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## 14 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## 15 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## 16 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## 17 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## 18 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## 19 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## 20 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## 21 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## 22 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## 23 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## 24 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ## 25 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 ## 26 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ## 27 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ## 28 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ## 29 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 ## 30 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 ## 31 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ## 32 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 # Close the Connection DBI::dbDisconnect(db)
Now you are ready to begin your analysis with your data!
3 - RJDBC Package Setup
Finally, the last way to configure a connection to the PostgreSQL database can be done via the RJDBC
package. The first step in this configuration is to download the jdbc jar file from here. I’ve put this in my home directory, ~
, and will reference this file in the JDBC()
function below. Once you have the jar file you can install the RJDBC
package in R.
install.packages('RJDBC')
Now you are ready to connect. Once again, notice the slight tweaks to the arguments of the dbConnect()
function. Because I’m defining the url
argument with the host, port and database name, there is no need for these additional arguments.
library(RJDBC) ## Loading required package: rJava db <- DBI::dbConnect(RJDBC::JDBC("org.postgresql.Driver","~/postgresql-42.2.2.jar"), url = "jdbc:postgresql://localhost:5432/postgres", user = "postgres", password = getPass("Enter Password:")) ## Please enter password in TK window (Alt+Tab) # Pull the Data into an R dataframe DBI::dbGetQuery(db,"SELECT * FROM MTCARS") ## row.names mpg cyl disp hp drat wt qsec vs am gear carb ## 1 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## 2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## 3 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## 4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## 5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## 6 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## 7 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## 8 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## 9 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## 10 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## 11 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## 12 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## 13 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## 14 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## 15 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## 16 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## 17 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## 18 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## 19 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## 20 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## 21 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## 22 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## 23 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## 24 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ## 25 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 ## 26 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ## 27 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ## 28 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ## 29 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 ## 30 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 ## 31 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ## 32 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 # Close the Connection DBI::dbDisconnect(db) ## [1] TRUE
Alright! We’ve walked through several different configurations in connecting to a PostgreSQL database on Ubuntu. You’ll only need one of these setups, but I think it’s nice to understand each of your options so you can create the best setup that works for you and/or your organization.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.