R to Oracle Database Connectivity: Use ROracle for both Performance and Scalability

June 12, 2013

(This article was first published on Oracle R Enterprise, and kindly contributed to R-bloggers)

R users have a few choices of how to connect to their Oracle
Database. The most commonly seen include: RODBC, RJDBC, and ROracle. However,
these three packages have significantly different performance and scalability
characteristics which can greatly impact your application development. In this
blog, we’ll discuss these options and highlight performance benchmark results
on a wide range of data sets.

If you use ROracle, we’d like to hear about your experience. Please take this brief survey.

By way of introduction, RODBC is an R package that
implements ODBC database connectivity. There are two groups of functions: the
largely internal odbc* functions implement low-level access to the
corresponding ODBC functions having a similar name, and the higher level sql*
functions that support read, save, copy, and manipulation of data between R
data.frame objects and database tables. Here is an example using RODBC:


<- odbcConnect("DD1", uid="rquser", pwd="rquser",
rows_at_time = 500)

test_table, "TEST_TABLE")

"select count(*) from TEST_TABLE")

<- sqlQuery(con, "select * from TEST_TABLE")


The R package RJDBC is an implementation of the R DBI
package – database interface – that uses JDBC as the back-end connection to the
database. Any database that supports a JDBC driver can be used in connection
with RJDBC. Here is an example using RJDBC:


<- JDBC("oracle.jdbc.OracleDriver",

" ")
con <- dbConnect(drv, "
jdbc:oracle:thin:@myHost:1521:db", "rquser",

"TEST_TABLE", test_table)

"select count(*) from TEST_TABLE")

<- dbReadTable(con, "TEST_TABLE")

The ROracle package is an implementation of the R DBI
package that uses Oracle OCI for high performance and scalability with Oracle
Databases. It requires Oracle Instant Client or Oracle Database Client to be
installed on the client machine. Here is an example using ROracle:


<- dbDriver("Oracle")

<- dbConnect(drv, "rquser", "rquser")


"select count(*) from TEST_TABLE")

<- dbReadTable(con, "TEST_TABLE")


Notice that since both RJDBC and ROracle implement the DBI
interface, their code is the same except for the driver and connection details.

To compare these interfaces, we prepared tests along several

  • Number of rows – 1K, 10K, 100K, and 1M
  • Number of columns – 10, 100, 1000
    Numeric data is randomly generated, all character data is 10 characters
  • Interface: RODBC 1.3-6 (with Data Direct 7.0 driver),
    RJDBC 0.2-1 (with rJava 0.9-4 with increased memory limit in JRIBootstrap.java),

    and ROracle 1.1-10 (with Oracle Database Client
  • Types of operations: select *, create table, connect

Loading database data
to an R data.frame

Where an in-database capability as provided by Oracle R
Enterprise is not available, typical usage is to pull data to the R client for
subsequent processing. In Figure 1, we compare the execution time to pull 10, 100, and 1000 columns of data
from 1K, 10, 100K, and 1M rows for BINARY_DOUBLE data on a log-log scale.
Notice that RJDBC does not scale to 100 columns x 1M rows, or above 1000 cols x
100K records. While RODBC and ROracle both scale to these volumes, ROracle is
consistently faster than RODBC: up to 2.5X faster. For RJDBC, ROracle is up to
79X faster.

Figure 1: Comparison of RJDBC, RODBC, and ROracle for BINARY_DOUBLE for
Select *

In Figure 2, we provide the range of results for RODBC,
ROracle, and RJDBC across all data types. Notice that only ROracle provides the
full range of scalability while providing superior performance in general.

ROracle is virtually always faster than RODBC: numeric data
up to 2.5X faster, character data up to 142X faster, and time stamp data up to
214X faster. RODBC fails to process 1000 columns at 1M rows.

For RJDBC, ROracle is up to 13X faster on numeric data, 79X
faster on binary double data, 3X for character data (excluding the 25X over the
smallest data set). Note that RODBC and RJDBC have a limit of 255 characters on
the length the VARCHAR2 columns. Time stamp data is the one area where RJDBC
initially shines, but then fails to scale to larger data sets.

Figure 2: Comparing the three interfaces for select * from

Data set sizes represented in megabytes are captured in
Table 1 for all data types. With only minor variation, the data sizes are the
same across data types.

Table 1: Dataset sizes in megabytes

Creating database
tables from an R data.frame

Data or results created in R may need to be written to a
database table. In Figure 3, we compare the execution time to create tables
with 10, 100, and 1000 columns of data with 1K, 10, 100K, and 1M rows for
BINARY_DOUBLE. Notice that in all three cases, RJDBC is slowest and does not
scale. RJDBC does not support the NUMBER or BINARY_DOUBLE data types, but uses
FLOAT(126) instead. ROracle scaled across the remaining data types, while RODBC
and RJDBC were not tested.

ROracle is 440X faster than RODBC for 1000 columns x 1M
rows, with a median of 4X faster across all data sets. ROracle is 630X faster
on 100 columns x 10K rows, with a median of 135X faster across all data sets.
RJDBC did not scale to the 1M row data sets.

Figure 3: Comparison of
RJDBC, RODBC, and ROracle for BINARY_DOUBLE create table

Connecting to Oracle

Depending on the application any sub-second response time
may be sufficient. However, as depicted in Figure 4, ROracle introduces minimal
time to establish a database connection. ROracle is nearly 10X faster than RJDBC
and 1.6X faster than RODBC.

Figure 4: Database connection times for ROracle, RODBC, and RJDBC

In summary, for maximal performance and scalability, ROracle
can support a wide range of application needs. RJDBC has significant
limitations in both performance and scalability. RODBC can be more difficult to
configure on various platforms and while it largely scales to the datasets
tested here, its performance lags behind ROracle.

If you use ROracle, we’d like to hear about your experience. Please take this brief survey.

All tests were
performed on a 16 processor machine with 4 core Intel Xeon E5540 CPUs @ 2.53
GHz and 74 GB RAM. Oracle Database was version For JDBC, the
following was modified before installing rJava.

was modified to use 2GB :

try {

System.out.println(jl.toString()+" -cp
"+System.getProperty("java.class.path")+" -Xmx2g -Dstage=2

Process p = Runtime.getRuntime().exec(new
String[] {

jl.toString(), "-cp",
"-Dstage=2", "Boot" });

System.out.println("Started stage 2
("+p+"), waiting for it to finish…");


} catch (Exception
re) {}

To leave a comment for the author, please follow the link and comment on their blog: Oracle R Enterprise.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.