Scheduling R Tasks with Crontabs to Conserve Memory

September 3, 2013
By

(This article was first published on NERD PROJECT » R project posts, and kindly contributed to R-bloggers)

One of R’s biggest pitfalls is that eats up memory without letting it go.  This can be a huge problem if you are running really big jobs, have a lot of tasks  to run, or there are multiple users on your local computer or r server.  When I run huge jobs on my mac, I can pretty much forget doing anything else like watching a movie or ram intensive gaming.  For my work, Kwelia, I run a few servers with a couple dedicated solely to R jobs with multiple users, and I really don’t want to up the size of the server just for the few times that memory is exhausted by multiple large jobs or all users on at the same time.  To solve this problem, I borrowed a tool, crontab, from the linux (we use an ubuntu server but works on my mac as well) folks to schedule my Rscripts to run at off hours (between 2am-8am), and the result is that I can almost cut the size of the server in half.

Installing Crontabs is easy (I used this tutorial and this video) in a linux environment but should be similar for mac and windows. From the command line enter the following to install:

sudo apt-get install gnome-schedule

Then to create a new task for any user on the system enter if you are the root user or admin:

sudo crontab -e

or as a specific user:

crontab -u yourusername -e

You must then choose your preferred text editor. I chose nano, but the vim works just as well. This will create a file that looks like this:
Screen Shot 2013-09-03 at 5.01.19 PM

The cron job is laid out in this format:minute (0-59), hour (0-23, 0 = midnight), day (1-31), month (1-12), weekday (0-6, 0 = Sunday), command. To run an rscript in the command just put the “Rscript” and then the file path name. An example:

0 0 * * * Rscript Dropbox/rstudio/dbcode/loop/loop.R

This runs the loop.R file at midnight (zero minute of the zero hour) every day of every week of every month because the stars mean all.  I have run endless repeat loops before in previous posts, but R consumes the memory and never free it.  However, running  cron jobs is like opening and closing R every time so the memory is freed (probably not totally) after the job is done.

As an example, I ran the same job in a repeat every twelve hours on the left side of the black vertical line, and on the right is the same job being called at 8pm and 8am.  Here’s the memory usage as seen through munin:

Screen Shot 2013-09-03 at 5.10.41 PM Screen Shot 2013-09-03 at 5.11.09 PM

I don’t have to worry nearly as much about my server overloading now, and I could actually downsize the server.

QED


To leave a comment for the author, please follow the link and comment on his blog: NERD PROJECT » R project posts.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.