Connecting R to MySQL/MariaDB

[This article was first published on Modern Data » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Overview

Relational databases, such as MySQL, organize data into tables consisting of rows and columns similarly to an R data.frame. While many beginners will be satisfied with R’s native RData storage format, storing data in a flat-file has its disadvantages. First, they are difficult to share among collaborators. .RData files can contain many matrices and data.frames. It’s very easy to create different versions if two collaborators are performing analyses which require cleaning the data. Second, they are insecure compared to the permissions available to administrators of SQL databases. For example, I may want a research assistant or an analyst on the data science team to have read-only access to certain tables to prevent data corruption, and restrict access entirely to sensitive, identifiable data such as participant names and email addresses. Using MySQL, I can grant SELECT to individuals that require read-only access and each individual can have separate login credentials. Thus, if an individual leaves the academic lab or resigns from the company, I can simply disable their access. Third, you can easily backup and version your database using mysqldump or automatic backups if you choose to use Amazon’s RDS. Did you merge incorrectly or accidentally wipe your data? No problem, just roll back to a previous version. Fourth, data can be efficiently streamed into R using SQL queries. As you may know, R data is stored entirely in-memory. This means to analyze a 24 GB dataset, you’ll need at least 24GB of free RAM in your computer in addition to what’s needed to perform your analysis. Instead, if I know that I only need last month’s customers to perform an analysis, I can stream only that data into R using a query such as:

SELECT * FROM table WHERE date > '2015-07-01';

If I wanted, I could automatically filter out incomplete values before it gets into R:

SELECT * FROM table WHERE date > '2015-07-01' AND test_score IS NOT NULL;

Interested? In this tutorial, you will learn how to store and retrieve data from a MySQL database with R using the RMySQL package.

Installing MySQL

Since MySQL runs on a variety of platforms, I’ll demonstrate how to install it on Mac, Linux, and Windows, as well as how to provision it using Amazon RDS. These instructions are by no means flawless – feel free to email us feedback if you find problems.

Using Homebrew on OS X

Homebrew is a package manager for OS X that enables the installation of a variety of packages from source. For our purposes, we’re going to install MySQL and run it as a daemon each time your computer starts. To install Homebrew, open Terminal.app and run the following code:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Terminal.app
Terminal.app

Before doing anything, run brew doctor and make sure you fix any of its complaints before proceeding. After you have a working Homebrew installing, you can now install MySQL:

# Update Homebrew
brew update
# Install MySQL
brew install MySQL
# Tell MySQL to load at startup
ln -sfv /usr/local/opt/mysql/*.plist ~/Library/LaunchAgents
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mysql.plist

Congrats! Try logging in:

mysql -uroot

You can also try mycli, which offers a more modern CLI for MySQL with features such as syntax completion. If that command fails, check that the server is actually running:

ps -ax | grep mysqld

Installing MySQL on Ubuntu

Ubuntu

Chances are, you may have an existing server in your lab or on an EC2 instance that you want to use for this tutorial. If you don’t, I highly recommend you sign up for Amazon’s Free Tier and try it out. Luckily, Linux distributions such as Ubuntu and Fedora come with package managers that make installing MySQL easier for end-users. In this example, we’ll use the APT package manager to complete the installation. Remember, you’ll need sudo access to complete this installation. If you don’t have administrative access, contact your server administrator and ask them to install MySQL for you.

# Update APT
sudo apt-get update
# Install the server and client
sudo apt-get install mysql-server-5.6 mysql-client-5.6

If you’re also using R on the same system, you can install RMySQL using APT. The benefit is that you avoid troubleshooting any compilation errors. However, you may want to add the official CRAN repo to your APT sources.list file before doing this. In my case, I verified my Ubuntu version by running lsb_release -a and added the following line to my APT sources list at /etc/apt/sources.list: deb http://cran.rstudio.org/bin/linux/ubuntu trusty/. Once you have the CRAN Ubuntu repo installed and have updated APT, you can install many CRAN plugins from APT:

sudo apt-get install r-cran-rmysql

Setting up RDS on AWS

First, sign up for AWS using your Amazon account. Once you’re in, navigate to the RDS section under “Databases”:

rdsNext, click on “Get Started Now” to proceed to the DB type selection screen and select “MySQL” as the type.

DB SelectionAfter you’ve selected a MySQL database, select “No” for Multi-AZ Zone Deployment, depending on if you want to use the Free Tier for the first year of your RDS server:

multiazNext, select the size and indicate the master username and password for your new database.  Keep in mind that databases >= 20GB are still eligible for the RDS Free-Tier:

db credentialsWhen the provisioning is finished, you should see your Instance and the endpoint you’ll use for connecting using RMySQL!

Screen Shot 2015-08-28 at 12.10.37 PM

Installing RMySQL

Next, we’re going to install another package to extend R’s native capabilities and allow it to read and write from MySQL. First, try to install the binary package from CRAN:

install.packages("RMySQL")
# Load the library
library(RMySQL)

If you don’t get any errors, you’re all set! However, depending on your version of R and your platform, you may have to build from source. If that’s the case, don’t fret! As long as you’ve installed the MySQL headers (for example, using Homebrew) you’ll be able to specify that you want to install RMySQL from source, in which it will compile the package:

install.packages("RMySQL", type = "source">
library(RMySQL)
Loading required package: DBI

Transferring Data

Next, we’ll discover how to read to and write data from a table in your MySQL database. This tutorial won’t deal with the best ways to store data, but I highly recommend the reader take a peak at Wickham, H. (2014). Tidy Data. RMySQL contains many functions, but the three we’ll focus on are 1) dbListTables(), dbReadTables(), and dbWriteTable(). These, as you’ve no doubt already guessed, allow you to 1) list all the tables available in a given database, read a MySQL table into R as a data.frame, and write to a MySQL table in a way conceptually similar to how write.csv() operates. First, let’s create the connection to the database and explore how some of the functions operate.  Note that in my case, dbListTables() returns a character vector with each table name because I’m connecting to an existing database.  New databases should not return anything.

library(DBI)
con_sql <- dbConnect(RMySQL::MySQL(), group = "group-name")

# List tables in current database
dbListTables(con_sql)
[1] "year1" "year2" "year3" "year4"

# Write data.frame to MySQL
dbWriteTable(conn = con_sql, name = 'mtcars', value = mtcars)

# Read MySQL table to data.frame
mtcars <- dbReadTable(conn = con_sql, name = 'mtcars')

# Check out your data!
head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Since the result of dbReadTable() is itself a data.frame, you can stream SQL directly into your plot.ly graphs and play around as usual. This functionality is fantastic for building persistent dashboards for your analysis.

library(plotly)
plot_ly(data = mtcars, color = as.factor(cyl), y = mpg, type = "box")

Screen Shot 2015-09-02 at 1.32.18 PM

Securing your Credentials in R

You’ll notice that in my example above I didn’t embed my credentials directly in my R script. Scripting usernames and passwords is a terrible idea in practice. For example, what if you upload your analysis to a public Github? Suddenly, the world knows that your password is ilovecats!. We can avoid such embarrassment by storing our credentials in a .my.cnf configuration file that RMySQL will look for and read. Values such as the database name, your username, and corresponding password can then be safely stored in your $HOME directory, away from your analysis. To do this, create a file called my.cnf in your home directory:

[group-name]
database=test
user=root
password=
host=127.0.0.1

Going Further

While this post is mainly concerned with 1) connecting R to MySQL databases and 2) reading and writing data, I encourage readers to check out the Khan Academy’s free course on SQL. Using the dbSendQuery() function, we can perform more complex queries that involve joining and subsetting data before they’re read into R:

query <- dbSendQuery(con_sql, "select * from mtcars where cyl = 4;")
data <- fetch(query, n=10)
dbClearResult(query)

To leave a comment for the author, please follow the link and comment on their blog: Modern Data » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)