ggplot2 graphics in a loop

[This article was first published on Kevin Davenport » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A client has a specific audit they perform quarterly across 200 of their manufacturing plants. The audit has 8 distinct sections examining the different areas of the plant (shipping, receiving, storage, packaging,etc.) Instead of having one cumulative final score, the audit displays a final score for each section. I wanted to examine the distribution of section’s scores before and after removing outliers and extreme values. I usually use the MASS package’s truehist() for quick looks at data, but since I’m writing a detailed loop I will use ggplot2 for fine aesthetic control. The historical results of audits were imported into a data frame with the 8 score columns as well as other instance identifying columns. Basically I don’t want to waste time writing out “ggplot(df,aes(x=x)) + geom_histogram()” or ”qplot(x,data = df))” for each section’s corresponding column in the df.

Below is the function in it’s entirety:

plotHistFunc <- function(x, na.rm = TRUE, ...) {
  nm <- names(x)
  for (i in seq_along(nm)) {
plots <-ggplot(x,aes_string(x = nm[i])) + geom_histogram(alpha = .5,fill = "dodgerblue")
ggsave(plots,filename=paste("myplot",nm[i],".png",sep=""))
  }
}

plotHistFunc(df) ## execute function

Line 1: The usual process of defining the function and assigning it a name, in this case I’m calling the function “plotHistFunc”. I’m also setting na.rm to TRUE so that NA entries are ignored. Here “x” will be a data frame that I specify when running the function later with plotHistFunc(“dataframe name here”)

Line 2: names() retrieves the column names of a given dataframe “x” to be used in the next step.

Line 3: This is where it gets tricky, here we define the typical “i in x” part of a generic function, however instead of putting “x” (our dataframe) here we will use the recently known to me seq_along() function. This function generates a regular sequence along object nm.

Line 4: The code to be iterated in the loop is within the inner set of brackets {}, here the ggplot function is assigned to object “plots”. plots aes_string which is useful when writing functions that create plots because you can use strings to define the aesthetic mappings, rather than having to mess around with expressions.

Line 5: ggsave is used to save plots to a file, along with paste I am able to generate unique file names for each plot. For example filename=paste(“myplot”,nm[i],”.png”,sep=”") will generate a file with name myplotshipping.png, “shipping” coming from the name of the column.

Here is the function if you wanted to generate/view the plots in your R environment, but not generate files.

plotHistFunc <- function(x, na.rm = TRUE, ...) {
  nm <- names(x)
  for (i in seq_along(nm)) {
print(ggplot(x,aes_string(x = nm[i])) + geom_histogram(alpha = .5,fill = "mediumseagreen")) }
}

It is important to note that the ggplot function needs to be wrapped in print in order for it to display.

To leave a comment for the author, please follow the link and comment on their blog: Kevin Davenport » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)