by Gregory Vandenbrouck
Software Engineer, Mirosoft
This post is the second in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux.
Last time we covered pulling data from SQL Azure to an R client on Windows. This time we’ll be pulling data from MariaDB (compatible with MySQL) hosted on a Linux VM in Azure to an R client on Windows.
Creating the database
The Azure Management site changes quite often, therefore these instructions are valid “at the time of this writing” :o)
- Log on the Azure Management site with your Azure subscription (see previous post),
- Create a VM: select “New”, “Compute”, “Virtual Machine”, “Quick Create” and then fill-in the information. In our case we chose “SUSE Linux Enterprise” for the image, but most images should work, both Windows and Linux.
- Once the creation is completed, click on “Virtual Machines”, your machine name, and then “endpoints”. You should already have SSH (port 22) listed if you chose a Linux VM (and if not, add it). Add MySQL (port 3306) to the list.
- Connect to your VM via ssh.
- Once logged on to the VM via ssh, install MySQL or MariaDB:
sudo zypper install mariadb*.
Note: this command and subsequent ones may vary depending on the choice of Linux distribution. For example the equivalent of “zypper” may instead be called: “yum”, “apt-get”, “synaptic”, “aptitude”, “dpkg-install”, etc.
- Start the server.
sudo rcmysql start
- Secure the server:
Add a user that’s authorized to connect remotely. Below is the full transcript of a session where we add user MyUser with password MyPassword that can connect remotely from any location with all privileges (not recommended), and we also create the MyDatabase database:
[email protected]:~> mysql -s --user=root -p Enter password: MariaDB [(none)]> create user 'MyUser'@'%' identified by 'MyPassword'; MariaDB [(none)]> grant all privileges on *.* to 'MyUser'@'%' with grant option; MariaDB [(none)]> create database MyDatabase; MariaDB [(none)]> quit [email protected]:~>
Connecting to the database from outside of R
Optional step, but can be useful for troubleshooting. For example to solve firewall, port and credential issues.
- Install mysql.exe client or MariaDB client.
Below an example of a successful test session in a command prompt (search for “cmd” in the Start menu or screen to start a command prompt):
C:>"c:Program FilesMySQLMySQL Workbench 6.2 CEmysql.exe" -s -hMyServer.cloudapp.net -uMyUser -pMyPassword -DMyDatabase Warning: Using a password on the command line interface can be insecure. mysql> select 1+1 as Answer; Answer 2 mysql> quit C:>
(replace MyServer, MyUser, MyPassword and MyDatabase with your values)
Note: there’s no space between the flags and values; e.g. it’s “-pMyPassword” and not “-p MyPassword”.
Connecting to the database from R on Windows
MySQL’s ODBC drivers need to be installed. To see if you have these installed and get the driver’s name, open the “ODBC Data Source” (see previous post for instructions), and look in the “Drivers” tab. Below an example with MySQL drivers installed:
If the drivers aren’t installed, you can get them by installing MySQL’s or MariaDB’s client. There are also ODBC-drivers-only installation options.
library(RODBC) myServer <- "MyServer.cloudapp.net" myUser <- "MyUser" myPassword <- "MyPassword" myDatabase <- "MyDatabase" myDriver <- "MySQL ODBC 5.3 Unicode Driver" connectionString <- paste0( "Driver=", myDriver, ";Server=", myServer, ";Database=", myDatabase, ";Uid=", myUser, ";Pwd=", myPassword) conn <- odbcDriverConnect(connectionString) sqlQuery(conn, "SELECT 142857 * 3 AS Cyclic")
## Cyclic ## 1 428571
close(conn) # don't leak connections !
Similar to the Microsoft SQL Server case, you can persist a Data Source Name (DSN). This time I created a “System DSN” (available to everyone):
Contrary to the SQL Server driver, MySQL allows for saving credentials. This is handy as credentials no longer need to be present in clear text in the R script. Unfortunately, the credentials are saved in clear text in Windows’ registry. Choose your poison!
For the drivers we used, the registry locations are
HKEY_CURRENT_USERSoftwareODBCODBC.INI for “User DSN” and
HKEY_LOCAL_MACHINESoftwareODBCODBC.INI for “System DSN”.
When using a DSN with saved credentials, the R code is quite simplified.
library(RODBC) # No need to specify uid and pwd: these are part of the DSN conn <- odbcConnect("MyMariaDBAzure") sqlQuery(conn, "SELECT 10001 - 73 * 137 AS Zero")
## Zero ## 1 0
To use RJDBC to connect to MySQL or MariaDB, you need to:
- Know the following:
- driverClass: the class name. See the driver-specific documentation.
- classPath: the location of the jar file. If you don’t know where it is, try running
dir /s /b %systemdrive%*sql*.jarfrom a command prompt.
- url connector prefix. Again, driver-specific.
In my specific setup:
library(RJDBC) drv <- JDBC( driverClass = "com.mysql.jdbc.Driver", classPath = "C:/Program Files (x86)/MySQL/MySQL Connector J/mysql-connector-java-5.1.35-bin.jar") conn <- dbConnect(drv, "jdbc:mysql://MyServer.cloudapp.net", "MyUser", "MyPassword") dbGetQuery(conn, "SELECT 1+1") dbDisconnect(conn)
- We’ve tried RODBC, RJDBC and RMySQL to connect to a MariaDB database hosted on a Suse VM in Azure.
- In all cases, we had to install something on the client machine (besides R packages): drivers and/or JDK.
- When it comes to performance, our experience was similar to Microsoft SQL Server’s case: RODBC was faster, both to connect and to get back the query results.