rsync as R package

[This article was first published on INWT-Blog-RBloggers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this article we present our R package rsync, which serves as an interface between R and the popular Linux command line tool rsync.

Originally rsync is an open source tool for efficiently synchronizing files. Published by Paul Mackerras and Andrew Tridgell under the GNU General Public License, it allows users of Unix systems to synchronize local and remote files between two locations. Apart from copying files, a big advantage of the rsync algorithm is to only transfer differences within the files, if possible. This allows high speeds and avoids redundancies.

Rsync enjoys great popularity, e.g. when creating backups as well as transferring or mirroring data. The tool supports the synchronization of links, rights, groups and devices.

Figure 1: Possible rsync scenarios covered by the rsync R package. Top: local to local. Bottom: local to remote.

Figure 1 visualizes the usage scenarios of rsync. We distinguish two cases. First, the case between two local directories (top half of the graph) and second, the case between a local directory and a remote rsync daemon (bottom half of the graph).

Rsync is only available for Unix systems and is controlled from the command line. Unfortunately we have to put Windows users at ease here. Only via a small detour, e.g. with cygwin, Windows users can work around this issue.

An example command to synchronize the x.csv file between the local directory and an rsync daemon might look like this:

<code>RSYNC_PASSWORD="1234" rsync -ltx /home/UserXY/Documents/x.csv rsync://[email protected]/someFolder"</code>

Since we implement many processes in R, we thought that an R package could take some work from us here. The R package handles the communication with the command line program rsync. The same command in R now looks like this:

<code>rsync::sendFile(con, fileName = 'x.csv')</code>

The R interface to rsync has several advantages:

  • Intuitive operation for users with R experience
  • Interaction with rsync directly from within R
  • Typical actions can be performed without command line skills

Installation

To make it easier for our R package to synchronize, the underlying rsync tool must be installed first. On most systems it is part of the installation already.

Then install our rsync package by running the following line in R:

<code># install.packages("devtools") devtools::install_github("INWTlab/rsync")</code>

Functionality

At the beginning of an R session, a connection between two endpoints has to be established. To have a reproducible example, we demonstrate the features with two local folders. You can create a rsync configuration using:

<code>library("rsync") dir.create("destination") dir.create("source") dest <- rsync(dest = "destination", src = "source") dest</code> <code class="output">## Rsync server:  ##   dest: /home/userxy/projects/rsync/destination ##   src: /home/userxy/projects/rsync/source ##   password: NULL  ## Directory in destination: ## [1] name         lastModified size         ## <0 rows> (or 0-length row.names)</code>

In the case of an rsync daemon you can also supply a password. The way you think about transactions is that we have a destination folder with which we want to interact. All methods provided by this package will always operate on the destination. It will not change the source, in most cases. An exception is sendObject, that will also create a file in source.

<code>x <- 1 y <- 2 sendObject(dest, x) sendObject(dest, y) </code> <code class="output">## Rsync server:  ##   dest: /home/userxy/projects/rsync/destination ##   src: /home/userxy/projects/rsync/source ##   password: NULL  ## Directory in destination: ##      name        lastModified size ## 1 x.Rdata 2019-02-28 12:30:51   61 ## 2 y.Rdata 2019-02-28 12:30:51   60 </code>

We can see that we have added two new files. These two files now exist in the source directory and the destination. The following examples illustrate the core features of the package:

<code>removeAllFiles(dest) # will not change source</code> <code class="output">## Rsync server:  ##   dest: /home/userxy/projects/rsync/destination ##   src: /home/userxy/projects/rsync/source ##   password: NULL  ## Directory in destination: ## [1] name         lastModified size         ## <0 rows> (or 0-length row.names)</code> <code>sendFile(dest, "x.Rdata") # so we can still send the files</code> <code class="output">## Rsync server:  ##   dest: /home/userxy/projects/rsync/destination ##   src: /home/userxy/projects/rsync/source ##   password: NULL  ## Directory in destination: ##      name        lastModified size ## 1 x.Rdata 2019-02-28 12:30:51   61</code>

loadData() also retrieves files from con$from, except that the information is loaded directly into the R workspace. dataName is the equivalent of entryName of the previous case and allows loading files of the following formats: .Rdata, .csv, .json.

For more information please check out the README and documentation of the the rsync package.

Feedback

You are welcome to help improve this R package, just create an issue on Github.

To leave a comment for the author, please follow the link and comment on their blog: INWT-Blog-RBloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)