November 19, 2011
By

(This article was first published on Decision Science News » R, and kindly contributed to R-bloggers)

COSTLESS FILE SYNCHRONIZATION TECHNIQUES IN INCREASING ORDER OF COMPLEXITY

It is not uncommon to have two computers at work, four at home, and a server out on the wide open internet. How to keep all these files in sync? Here are some file synchronization tools that we use, listed in increasing order of complexity.

Dropbox dropbox.com
Setup: Easy.
OS: Windows, Linux, or Mac
We use this at Decision Science News, but only for some of our files. New users get 2GB of storage for free (or 2.25 GB if they use this link). If used sparingly it can last a long time. We find this especially useful for open-updated files that one doesn’t want out of sync for even a minute. For files that only need to be synced every day or so, we use Unison, covered next.

Unison www.cis.upenn.edu/~bcpierce/unison
Setup: Moderate for USB drive use, hard for network use (requires installing server software)
OS: Windows, Linux, or Mac

We sync about 10GB of files with unison. Unison works across the network or with a portable USB drive. Like the other solutions listed here, it magically only needs to sync the differences between files, which is much faster than moving whole files around. We have unison run as a scheduled task to make sure files get synched at least daily.

The best Unison tip is to set up a “star” configuration. That is, you designate one server (or one USB drive) as the hub and all your other machines as spokes off of it. You sync each spoke with the hub, and never sync one spoke directly to another spoke.

On a Windows7 system, unison will create a .unison folder in the C:\Users\YourUserName directory. You can put configuration files (with .prf extensions) there to tell unison what to do. Here’s a sample config file to sync the directory C:\DG on your machine to a folder E:\DG, on a USB drive.

==myconfig.prf == (assumes Unison 2.27.57 is installed on the server)
root = C:\DG root = E:\DG batch=true fastcheck=true log=true 

We wrote a little batch file to start the sync process:
==sync.bat contents== (assumes Unison is installed under C:)
 "C:\Unison-2.27.57 Text.exe" myconfig

When getting started, there’s a GUI version of Unison that helps you get the knack of it. For everyday use, the text version (called from our batch file, above) is the way to go.

Want to sync to a server instead of a USB drive? Here is an example config file we use to sync a local directory (C:\DG) to a directory on a linux server (/home/dsn/DG). We sync all our computers (the spokes) with this same linux server directory (the hub), which keeps all our computers in sync.

==myconfig.prf == (assumes Unison 2.27.57 is installed on the server and that ssh is installed on the Windows machine)
root = C:\DG root = ssh://[email protected] /* <![CDATA[ */ (function(){try{var s,a,i,j,r,c,l,b=document.getElementsByTagName("script");l=b[b.length-1].previousSibling;a=l.getAttribute('data-cfemail');if(a){s='';r=parseInt(a.substr(0,2),16);for(j=2;a.length-j;j+=2){c=parseInt(a.substr(j,2),16)^r;s+=String.fromCharCode(c);}s=document.createTextNode(s);l.parentNode.replaceChild(s,l);}}catch(e){}})(); /* ]]> */ //home/dsn/DG batch=true fastcheck=true log=true

Subversion subversion.tigris.org
Setup: Hard (Need to know how to install and configure client and server software)
OS: Windows, Linux, Mac
Built as a version control system for programmers, some people use Subversion to keep all their files in sync. It is a programmer’s tool and not easy to learn, though if you read the free subversion book and are handy with computers, you can learn it. You’ll want to have a server running on the network somewhere to make this a viable option.

We use subversion to keep our research projects (R source code, documentation, LaTeX writeups, images, PDFs of articles, small data sets) synched across many machines.

Setup: Hard (Need to know how to install and compile server software)
OS: Linux only

Also not built for the purpose, Lsyncd can be used in conjunction with Unison to keep files in sync. Lsyncd (or “live synching daemon”) is a program that watches a bunch of files waiting for any of the to be changed. Once a change occurs, it can trigger arbitrary actions, such as synching them. J D Long uses lsyncd to keep his R files (specifically, R Studio output) in sync with his local machine. Post 1. Post 2. At DSN, we use lsyncd to create a magic folder on our server that pushes R plots generated on the server back to our PC automatically.

Some other ideas have been coming in through the comments. I will list them here for posterity.

• Box.net
• DVCS-Autosync
• Rsync
• Sparkleshare
• Sugarsync
• Ubuntu one
• Wuala

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...