Organization and R

September 5, 2007
By

(This article was first published on "R-bloggers" via Tal Galili in Google Reader, and kindly contributed to R-bloggers)

Many R users seem to get themselves in a bit of a mess with R files and workspaces scattered across different directories. The R files themselves also get messy and hard to follow. So here is some advice on keeping organized with R:

  • Try to keep code strictly indented based on the code structure such as loops, if statements, etc. Every left brace { should be followed by an extra level of indentation which continues until the matching right brace }. You should be able to quickly identify what lines are part of a loop, or are conditioned by an if statement, simply by the levels of indentation.
  • Comment copiously. You need to be able to figure out what your code does in a year’s time.
  • Have a single directory for each project. Within that, keep an R workspace, an R file containing the functions you’ve written, and one or more R files containing the code to read in the data, apply the functions to the data, plot some graphs, etc.
  • Don’t have multiple versions of essentially the same code. If you are doing similar things to what you’ve done before, write a function to do it and call it when required.
  • Have a main.R file which does all the analysis for the paper, chapter or report. It may simply consist of source lines such as
  • source(”functions.R”)
    source(”readdata.R”)
    source(”fitmodel.R”)

    Then the whole project can be run by sourcing the main file. If you find an error in your data, or you get updated or revised data, it is then a simple matter of running  main.R and all the graphs and analysis will be re-created.

  • Every graph and table to go into your written document should be created via code. Use savepdf() or saveeps() in the monash package for graphs, and xtable() in the xtable package for LaTeX tables. (For more complicated LaTeX tables, latex() in the Hmisc package is also useful.)

To leave a comment for the author, please follow the link and comment on his blog: "R-bloggers" via Tal Galili in Google Reader.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.