Using Azure as an R datasource Part 3: Pulling data from MySQL/MariaDB to Linux

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Gregory Vandenbrouck
Software Engineer, Microsoft

This post is the third in a series that covers pulling data from Microsoft SQL Server or MySQL/MariaDB on Azure to an R client on Windows or Linux.

In the previous two, we covered pulling data from SQL Server and MySQL/MariaDB to an R client on Windows.

This time we’ll be pulling data from MySQL/MariaDB to an R client on Linux.

Setting up the Linux client machine

To make things a little more interesting, we’ll have both the machine running MySQL/MariaDB and the client machine in Azure (but different ones).

Creating the client VM

  1. Log on the Azure Management site with your Azure subscription (see first post),
  2. Create a VM: select “New”, “Compute”, “Virtual Machine”, “Quick Create” and then fill-in the information. We picked one of the Ubuntu images.
  3. Make sure to create the client VM in the same region/affinity and same subscription as the machine hosting the MySQL/MariaDB server, otherwise accessing the database is going to be slow and/or problematic.

We ended up creating machine MyAzureUbuntu.cloudapp.net with account azureuser.

Accessing the client VM

Verify that ssh (TCP port 22) is one of the endpoints. To access the client VM from your local machine, you can type ssh [email protected] from a command prompt if you have an ssh client installed.

If your local machine runs Windows, you can use PuTTY or MobaXterm instead.

Configuring the client VM

Once connected to the client VM via ssh:

  • To install R and the tools/libraries to connect to MySQL/MariaDB, type: sudo apt-get install r-cran-rodbc unixodbc unixodbc-bin odbcinst libmyodbc mariadb-client r-cran-rmysql

Note: if not using Unbutu, you may have to use an alternative to apt-get, such as yum, zypper, etc.

  • To add GUI support (optional), type: sudo apt-get install ubuntu-desktop.
    • If your ssh client supports X11-Forwarding, you can start GUI applications from the client (for example: gedit) and have them displayed on your local machine. For example inMobaXterm’s case, here are the settings to enable: GV_f1

 

 

 

 

 

 

    • You can instead access the entire desktop environment by using vnc or xrdp (out of scope for this blog post). Just make sure to add endpoints for the corresponding ports (in the case of xrdp: TCP port 3389).

Creating the test database

We’ll be using the same MariaDB server as in the previous post, but this time we’ll create a sample database and table to make it a little more interesting.

Connecting to the server

We’re connecting to the server from the client Ubuntu machine. In addition to creating the database and table, this is useful for troubleshooting, for example to solve firewall, port and credential issues.

From a command prompt on the client VM, connect to the server:

azureuser@MyAzureUbuntu:~$ mysql -hMyServer.cloudapp.net -uMyUser -pMyPassword

(replace MyServer, MyUser and MyPassword with your values)

Creating the sample table

Once successfully connected to the server, run the following script in the MySQL client:

create database MyDatabase;
use MyDatabase;
create table MyTable (x float not null, y float not null);
drop procedure if exists PopulateMyTable;
delimiter $$
create procedure PopulateMyTable()
begin
declare i int default 0;
while i < 500 do
    insert into MyTable(x, y) values(rand(), rand());
    set i = i+1;
end while;
end $$
delimiter ;
call PopulateMyTable();
select count(*) from MyTable;

The last select statement should return 500, confirming we have created a table called MyTable consisting of 500 rows of 2 columns (x and y) with random values.

Connecting to the database from R on Linux

Using RODBC’s odbcDriverConnect function

MySQL’s ODBC drivers need to be registered in order to connect using RODBC. The following shows how to do it and how to see the contents of the configuration file.

azureuser@MyAzureUbuntu:~$ cd /etc
azureuser@MyAzureUbuntu:/etc$ cat odbcinst.ini
azureuser@MyAzureUbuntu:/etc$ sudo odbcinst -i -d -f /usr/share/libmyodbc/odbcinst.ini
odbcinst: Driver installed. Usage count increased to 1.
    Target directory is /etc
azureuser@MyAzureUbuntu:/etc$ cat odbcinst.ini
[MySQL]
Description=MySQL driver
Driver=libmyodbc.so
Setup=libodbcmyS.so
CPTimeout=
CPReuse=
UsageCount=1

Once installed, our table can be queried from R with an ODBC connection string using RODBC:

library(RODBC)

myServer <- "MyServer.cloudapp.net"
myUser <- "MyUser"
myPassword <- "MyPassword"
myDatabase <- "MyDatabase"
myDriver <- "MySQL" # same as the name in the [] in the odbcinst.ini file

connectionString <- paste0(
    "Driver=", myDriver,
    ";Server=", myServer,
    ";Database=", myDatabase,
    ";Uid=", myUser,
    ";Pwd=", myPassword)
conn <- odbcDriverConnect(connectionString)
ds <- sqlQuery(conn, "SELECT * FROM MyTable")
close(conn)
plot(ds, col = "red", pch = 19, main = "Querying MySQL with RODBC odbcDriverConnect")

GV_f2

Using RODBC’s odbcConnect function

odbc.ini is the file used to store DSNs. If you have isql installed, you can use it to test the connection. Below we show both the contents of the file and a test using isql:

azureuser@MyAzureUbuntu:/etc$ cat odbc.ini
[Azure-MySQL]
Description = Testing MySQL Azure
Trace       = Off
TraceFile   = stderr
Driver      = MySQL
SERVER      = MyServer.cloudapp.net
USER        = MyUser
PASSWORD    = MyPassword
DATABASE    = MyDatabase

azureuser@MyAzureUbuntu:/etc$ echo select 1+1 | isql Azure-MySQL -b
+---------------------+
| 1+1                 |
+---------------------+
| 2                   |
+---------------------+
SQLRowCount returns 1
1 rows fetched

Setting up a DSN greatly simplifies the R code:

library(RODBC)

conn <- odbcConnect("Azure-MySQL") 
ds <- sqlQuery(conn, "SELECT * FROM MyTable")
close(conn)
hist(ds$x + ds$y, col = "blue", main = "Querying MySQL with RODBC odbcConnect")

GV_f3

Using RJDBC

To use RJDBC to connect to MySQL or MariaDB, you need to:

  • From a command prompt on the client VM:
    • Install Java: sudo apt-get install openjdk-8-jdk libmysql-java sqlline
    • (Optional) Test the connection outside of R: echo select count(*) from MyTable | sqlline -u "jdbc:mysql://MyServer.cloudapp.net/MyDatabase" -n MyUser -p MyPassword
    • Register Java in R: sudo R CMD javareconf
  • Install the R package from an R terminal on the client VM (no Ubuntu packages via apt-get are provided): install.packages("RJDBC")

In R:

In addition to the Url connector prefix, we also need to specify in the call to JDBC:

  • classPath: the location of the jar file. Running locate *mysql*.jar from a command prompt may help here. If you don’t get any results, run sudo updatedb first.
  • driverClass: the class name. If the one we use below doesn’t work, check the corresponding JDBC driver documentation.

In our particular setup:

library(RJDBC)
## Loading required package: DBI
## Loading required package: rJava
drv <- JDBC(
  driverClass = "com.mysql.jdbc.Driver", # check driver's documentation name
  classPath = "/usr/share/java/mysql-connector-java.jar", # use locate and updatedb to find location
  identifier.quote="`")
conn <- dbConnect(drv, 
  "jdbc:mysql://MyServer.cloudapp.net/MyDatabase", 
  "MyUser",
  "MyPassword")
ds <- dbGetQuery(conn, "select * from MyTable")
dbDisconnect(conn)
## [1] TRUE
plot(ds, col=rgb(200,0,0,50, maxColorValue=255), main = "Querying MySQL with RJDBC", pch = 19, cex = 3)

GV_f4

Using RMySQL

Much easier to use than RJDBC (no url connector, driverClass and classpath to figure out):

library(RMySQL)
conn <- dbConnect(
  RMySQL::MySQL(),
  host = "MyServer.cloudapp.net",
  user = "MyUser",
  password = "MyPassword",
  dbname = "MyDatabase")
ds <- dbGetQuery(conn, "select * from MyTable")
dbDisconnect(conn)
plot(ds, col = "green", main = "Querying MySQL with RODBC RMySQL")

Summary

  • We’ve tried RODBC, RJDBC and RMySQL to connect to a MariaDB database hosted on a Suse VM in Azure from a Linux Ubuntu client, also hosted in a VM on Azure.
  • With respect to both ease of use and setup RMySQL & RODBC compare favorably to RJDBC: they require less code and don’t need to specify low level parameters.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)