A workflow for R

October 22, 2010
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Writing an R script is one thing. Organizing your process: where to put the data, how to refer to files in scripts, how to run the scripts, and how to produce and collect and report the results; that's quite another. Every R user has their own workflow for doing data analysis with R, but the best workflows achieve the following goals:

  • Transparency: A good workflow organizes the elements of the project logically and clearly, to make it easy for an observer (including yourself) to understand how the pieces come together.
  • Maintainability: A good workflow makes it easy to modify and adapt the project. Standardized script names and good commenting practices (in the code, as well as things like README files) are key here.
  • Modularity: A good workflow encapsulates discrete tasks into separate components (e.g. scripts), so that it's always clear where modifications need to be made (and only made in one place), and components are re-usable for other projects.
  • Portability: A good workflow makes it easy to move the project to another system, or hand it over to another person to work on, in such a way that it can still easily be run elsewhere. (By using relative (not absolute) pathnames, and remote access to sharedWorkflow for statistical analysis and report writing data, are two examples.)
  • Reproducibility: A good workflow makes it easy for you, or others, to reproduce your results.
  • Efficiency: Here I'm referring to the efficiency of you, the programmer, not computational efficiency. A good workflow saves you time, by making it easier to work on the project, and by automating as much of the process as possible.

Other than the package system (which is great, but can be overkill for many projects), R doesn't have any formal standards for designing a workflow. But here are a couple of suggestions from the R community:

If you have other suggestions for organizing an R workflow, let us know in the comments.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.