ggplot2 graphics in a loop

April 29, 2013
By

(This article was first published on Kevin Davenport » R, and kindly contributed to R-bloggers)

A client has a specific audit they perform quarterly across 200 of their manufacturing plants. The audit has 8 distinct sections examining the different areas of the plant (shipping, receiving, storage, packaging,etc.) Instead of having one cumulative final score, the audit displays a final score for each section. I wanted to examine the distribution of section’s scores before and after removing outliers and extreme values. I usually use the MASS package’s truehist() for quick looks at data, but since I’m writing a detailed loop I will use ggplot2 for fine aesthetic control. The historical results of audits were imported into a data frame with the 8 score columns as well as other instance identifying columns. Basically I don’t want to waste time writing out “ggplot(df,aes(x=x)) + geom_histogram()” or ”qplot(x,data = df))” for each section’s corresponding column in the df.

Below is the function in it’s entirety:

plotHistFunc <- function(x, na.rm = TRUE, ...) {
  nm <- names(x)
  for (i in seq_along(nm)) {
plots <-ggplot(x,aes_string(x = nm[i])) + geom_histogram(alpha = .5,fill = "dodgerblue")
ggsave(plots,filename=paste("myplot",nm[i],".png",sep=""))
  }
}

plotHistFunc(df) ## execute function

Line 1: The usual process of defining the function and assigning it a name, in this case I’m calling the function “plotHistFunc”. I’m also setting na.rm to TRUE so that NA entries are ignored. Here “x” will be a data frame that I specify when running the function later with plotHistFunc(“dataframe name here”)

Line 2: names() retrieves the column names of a given dataframe “x” to be used in the next step.

Line 3: This is where it gets tricky, here we define the typical “i in x” part of a generic function, however instead of putting “x” (our dataframe) here we will use the recently known to me seq_along() function. This function generates a regular sequence along object nm.

Line 4: The code to be iterated in the loop is within the inner set of brackets {}, here the ggplot function is assigned to object “plots”. plots aes_string which is useful when writing functions that create plots because you can use strings to define the aesthetic mappings, rather than having to mess around with expressions.

Line 5: ggsave is used to save plots to a file, along with paste I am able to generate unique file names for each plot. For example filename=paste(“myplot”,nm[i],”.png”,sep=”") will generate a file with name myplotshipping.png, “shipping” coming from the name of the column.

Here is the function if you wanted to generate/view the plots in your R environment, but not generate files.

plotHistFunc <- function(x, na.rm = TRUE, ...) {
  nm <- names(x)
  for (i in seq_along(nm)) {
print(ggplot(x,aes_string(x = nm[i])) + geom_histogram(alpha = .5,fill = "mediumseagreen")) }
}

It is important to note that the ggplot function needs to be wrapped in print in order for it to display.

To leave a comment for the author, please follow the link and comment on his blog: Kevin Davenport » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.