Using Azure as an R datasource: Part 2 – Pulling data from MySQL/MariaDB

June 2, 2015
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Gregory Vandenbrouck
Software Engineer, Mirosoft

This post is the second in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux.

Last time we covered pulling data from SQL Azure to an R client on Windows. This time we’ll be pulling data from MariaDB (compatible with MySQL) hosted on a Linux VM in Azure to an R client on Windows.

Creating the database

The Azure Management site changes quite often, therefore these instructions are valid “at the time of this writing” :o)

  1. Log on the Azure Management site with your Azure subscription (see previous post),
  2. Create a VM: select “New”, “Compute”, “Virtual Machine”, “Quick Create” and then fill-in the information. In our case we chose “SUSE Linux Enterprise” for the image, but most images should work, both Windows and Linux.
  3. Once the creation is completed, click on “Virtual Machines”, your machine name, and then “endpoints”. You should already have SSH (port 22) listed if you chose a Linux VM (and if not, add it). Add MySQL (port 3306) to the list.
  4. Connect to your VM via ssh.
    • Putty is a free lightweight ssh client for Windows,
    • MobaXterm is a more advanced solution that also provides a free version.
  5. Once logged on to the VM via ssh, install MySQL or MariaDB: sudo zypper install mariadb*.
    Note: this command and subsequent ones may vary depending on the choice of Linux distribution. For example the equivalent of “zypper” may instead be called: “yum”, “apt-get”, “synaptic”, “aptitude”, “dpkg-install”, etc.
  6. Start the server. sudo rcmysql start
  7. Secure the server: sudo mysql_secure_installation
  8. Add a user that’s authorized to connect remotely. Below is the full transcript of a session where we add user MyUser with password MyPassword that can connect remotely from any location with all privileges (not recommended), and we also create the MyDatabase database:

    [email protected]:~> mysql -s --user=root -p
    Enter password:
    MariaDB [(none)]> create user 'MyUser'@'%' identified by 'MyPassword';
    MariaDB [(none)]> grant all privileges on *.* to 'MyUser'@'%' with grant option;
    MariaDB [(none)]> create database MyDatabase;
    MariaDB [(none)]> quit
    [email protected]:~>

Connecting to the database from outside of R

Optional step, but can be useful for troubleshooting. For example to solve firewall, port and credential issues.

  • Install mysql.exe client or MariaDB client.
  • Below an example of a successful test session in a command prompt (search for “cmd” in the Start menu or screen to start a command prompt):

    C:>"c:Program FilesMySQLMySQL Workbench 6.2 CEmysql.exe" -s -hMyServer.cloudapp.net -uMyUser -pMyPassword -DMyDatabase
    Warning: Using a password on the command line interface can be insecure.
    mysql> select 1+1 as Answer;
    Answer
    2
    mysql> quit
    C:>

(replace MyServer, MyUser, MyPassword and MyDatabase with your values)

Note: there’s no space between the flags and values; e.g. it’s “-pMyPassword” and not “-p MyPassword”.

Connecting to the database from R on Windows

Using RODBC’s odbcDriverConnect function

MySQL’s ODBC drivers need to be installed. To see if you have these installed and get the driver’s name, open the “ODBC Data Source” (see previous post for instructions), and look in the “Drivers” tab. Below an example with MySQL drivers installed:

  ODBC3

If the drivers aren’t installed, you can get them by installing MySQL’s or MariaDB’s client. There are also ODBC-drivers-only installation options.

library(RODBC)

myServer <- "MyServer.cloudapp.net"
myUser <- "MyUser"
myPassword <- "MyPassword"
myDatabase <- "MyDatabase"
myDriver <- "MySQL ODBC 5.3 Unicode Driver"

connectionString <- paste0(
    "Driver=", myDriver,
    ";Server=", myServer,
    ";Database=", myDatabase,
    ";Uid=", myUser,
    ";Pwd=", myPassword)
conn <- odbcDriverConnect(connectionString)
sqlQuery(conn, "SELECT 142857 * 3 AS Cyclic")
##   Cyclic
## 1 428571
close(conn) # don't leak connections !

Using RODBC’s odbcConnect function

Similar to the Microsoft SQL Server case, you can persist a Data Source Name (DSN). This time I created a “System DSN” (available to everyone):

ODBC4

Contrary to the SQL Server driver, MySQL allows for saving credentials. This is handy as credentials no longer need to be present in clear text in the R script. Unfortunately, the credentials are saved in clear text in Windows’ registry. Choose your poison! 

For the drivers we used, the registry locations are HKEY_CURRENT_USERSoftwareODBCODBC.INI for “User DSN” and HKEY_LOCAL_MACHINESoftwareODBCODBC.INI for “System DSN”.

When using a DSN with saved credentials, the R code is quite simplified.

library(RODBC)

# No need to specify uid and pwd: these are part of the DSN
conn <- odbcConnect("MyMariaDBAzure")
sqlQuery(conn, "SELECT 10001 - 73 * 137 AS Zero")
##   Zero
## 1    0
close(conn)

Using RJDBC

To use RJDBC to connect to MySQL or MariaDB, you need to:

  • Install:
  • Know the following:
    • driverClass: the class name. See the driver-specific documentation.
    • classPath: the location of the jar file. If you don’t know where it is, try running dir /s /b %systemdrive%*sql*.jar from a command prompt.
    • url connector prefix. Again, driver-specific.

In my specific setup:

library(RJDBC)
drv <- JDBC(
    driverClass = "com.mysql.jdbc.Driver",
    classPath = "C:/Program Files (x86)/MySQL/MySQL Connector J/mysql-connector-java-5.1.35-bin.jar")
conn <- dbConnect(drv, "jdbc:mysql://MyServer.cloudapp.net", "MyUser", "MyPassword")
dbGetQuery(conn, "SELECT 1+1")
dbDisconnect(conn)

Using RMySQL

RMySQL also needs the JDK installed.

It’s slighter easier to use than RJDBC:

library(RMySQL)
conn <- dbConnect(RMySQL::MySQL(), host = "MyServer.cloudapp.net", user="MyUser", password="MyPassword")
dbGetQuery(conn, "SELECT 1+1")
dbDisconnect(conn)

Summary

  • We’ve tried RODBC, RJDBC and RMySQL to connect to a MariaDB database hosted on a Suse VM in Azure.
  • In all cases, we had to install something on the client machine (besides R packages): drivers and/or JDK.
  • When it comes to performance, our experience was similar to Microsoft SQL Server’s case: RODBC was faster, both to connect and to get back the query results.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)