# How to update and backup a MySQL database under version control and all within Rstudio

July 21, 2014
By

(This article was first published on Bartomeus lab » Rstats, and kindly contributed to R-bloggers)

I am trying to have better workflows to ensure data quality and two important things for me are first, scripting as much as posible the data manipulation process, and second, backing up the database we use under version control (e.g. Git*). I succeeded on that, but it was a 11 TAB problem**, so I though would be good post it here for others, and for my future self.

One goal of the task is to be able to do everything within the same program (Rstudio) for simplicity, so the task implies connecting R to MySQL server via RMySQL, update the database, make a backup copy from R and commit changes.

1) Running mysqldump from R to make an initial backup: RMySQL can’t (that I know) run the most commonly used backup mysqldump. But you can do it by calling the shell (aka Terminal) from R with the function system{base}.

#first we can see which is our path on the shell/terminal
system("pwd") #it should be the working directory.
#Second, we make a mysqldump (has to be in one non broken line,
#sorry beautiful 80 characters-max code lines)
system("/Applications/MAMP/Library/bin/mysqldump -u USER -h HOSP_IP -pPASWORD --port=8889 --skip-extended-insert DATABASE > DATABASE.sql")
#note that my mysqldump is located in applications folder because
#instaled MySQL using MAMP (highly recommended for an easy set up
#of a local MySQL environment in a Mac).
#Yours may be in /usr/local/mysq/bin/mysqldump
#The --skip-extended-insert makes a INSERT for each row,
#which makes version control lighter and more readable later on.

2) Put the backup under version control. Rstudio provide a nice hook to git, so you just need to ‘git init‘ the directory, ‘git add DATABASE.sql‘  and ‘git commit‘ the changes with a nice commit message.

3) Connecting R to MYSQL server to update it: There are good tutorials on this, so I’ll be brief. In this example I manipulated data within R (your.data.frame) and want to append the resulting dataframe to an existing table.

library(RMySQL)
#You can have an r file (added to .gitignore) with the line
#psw <- "mypassword", which you can call later from the code.
conn <- dbConnect(dbDriver("MySQL"), user = "USER",
port = 8889 , host= "HOST_IP")
#this set up the connection.
dbListTables(conn) # show tables
dbWriteTable(conn, "TABLE", your.data.frame, append = TRUE,
overwrite=FALSE, row.name=FALSE)
#adds the new rows at the bottom of the table.
dbDisconnect(conn) #always disconnect at the end.

4) Make another backup:

system("/Applications/MAMP/Library/bin/mysqldump -u USER -h HOSP_IP -pPASWORD --port=8889 --skip-extended-insert DATABASE > DATABASE.sql")

5) And finally you can make another commit reflecting the changes ‘git add DATABASE.sql‘  and ‘git commit’ with a nice commit message. You can compare the two versions now, and let anyone in the project play with it by pushing it to a central repo (e.g. Github).

I am sure there are other ways to do this, but I am pretty happy of how it works!

—-
*Git forces you to not only document changes, but explain why those changes are done in a nice story. So I prefer it to Audit Trail options in MySQL.
**Someone on the internet suggested to rate the difficulty of a problem in  the number of browser tabs you need to open in order to fix it. Sorry I forgot the source.