There has been recent interest in the attention that Oracle has been giving to R. This has been related to the use of Oracle Data Mining which is a paid option to Oracle. This is the first in a series of posts that can be run using the Oracle HR Schema which is available in various versions or Oracle including its free Oracle XE version.
The following steps are required to set up an environment that can be used to use R to interact with an Oracle database.
Installation and Configuration
1) Install Oracle XE (or get access to another Oracle database). There is a fair amount of documentation, but in practice the installation is pretty straightforward on Windows with a user with Admin rights.
2) Install the sample schema if needed. This would require a database user with special privileges – so you might need to get a DBA involved if you aren’t using your own machine.
3) Unlock the HR user account. This can also be done by logging in as the system user at a SQL*Plus command prompt and executing
ALTER USER HR IDENTIFIED BY password ACCOUNT UNLOCK;
where password is the password you are using for the HR account login.
Verify Connection to the HR Schema
Log in using SQL*Plus, SQLDeveloper, or other Oracle client software. For example, at an OS prompt:
sqlplus hr/password @xe
SQL*Plus: Release 10.2.0.1.0 – Production on Thu …
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Oracle Database 10g Express Edition Release 10.2.0….
If you have several versions of Oracle installed, you might need to change environmental variables such as your path to point at the correct installation. For instance, on Windows:
More extensive Oracle XE documentation is available online.
Configure ODBC Connection
These examples will rely upon the RODBC package.
1) Find the Data Sources (ODBC) option under Administrative Tools.
2) Add a connection to your database if needed. In the case of Oracle XE, choose the add button if you do not have an XE configured.
Configure the details in the popup window. You probably will only need to enter the Data Source Name, Description, TNS Service Name and User ID. Just leave other settings as defaulted.
Choose OK and close out ODBC Configuration.
Test R connection with Oracle
Install RODBC if you have not already done so. Run the following program from R.
ch <- odbcConnect(“XE”,uid=”HR”,pwd=”HR”)
You can execute queries using the connection as well.
sqlQuery(ch, “select sys_context(‘USERENV’,’DB_NAME’) db from dual”)
I appreciate the statistics community example of using well known published data sets. The Oracle HR Schema is one of the sample schemas that is used consistently in documentation and examples.
This is where the fun really begins – so stay tuned to see what we can do with the Oracle HR data in R.
Part II is now available.