Operating on datasets inside a function

October 9, 2011

(This article was first published on Quantum Forest » rblogs, and kindly contributed to R-bloggers)

There are times when we need to write a function that makes changes to a generic data frame that is passed as an argument. Let’s say, for example, that we want to write a function that converts to factor any variable with names starting with a capital letter. There are a few issues involved in this problem, including:

  • Obtaining a text version of the name of the dataset (using the substitute() function).
  • Looping over the variable names and checking if they start with a capital letter (comparing with the LETTERS vector of constants).
  • Generating the plain text version of the factor conversion, glueing the dataset and variable names (using paste()).
  • Parsing the plain text version of the code to R code (using parse()) and evaluating it (using eval()). This evaluation has to be done in the parent environment or we will lose any transformation when we leave the function, which is the reason for the envir() specification.
CapitalFactors = function(dataset) {
  # Gets text version name of dataset
  data.name = substitute(dataset)

  # Loops over variable names of dataset
  # and extracts the ones that start with uppercase
  for(var.name in names(dataset)){
    if(substr(var.name, 1, 1) %in% LETTERS) {
      left = paste(data.name, '$', var.name, sep = '')
      right = paste('factor(', left, ')', sep = '')
      code = paste(left, '=', right)
      # Evaluates the parsed text, using the parent environment
      # so as to actually update the original data set
      eval(parse(text = code), envir = parent.frame())

# Create example dataset and display structure
example = data.frame(Fert = rep(1:2, each = 4),
                     yield = c(12, 9, 11, 13, 14, 13, 15, 14))

'data.frame':	8 obs. of  2 variables:
 $ Fert : int  1 1 1 1 2 2 2 2
 $ yield: num  12 9 11 13 14 13 15 14

# Use function on dataset and display structure

'data.frame':	8 obs. of  2 variables:
 $ Fert : Factor w/ 2 levels "1","2": 1 1 1 1 2 2 2 2
 $ yield: num  12 9 11 13 14 13 15 14

And that’s all. Now the Fert integer variable has been converted to a factor. This example function could be useful for someone out there.

To leave a comment for the author, please follow the link and comment on their blog: Quantum Forest » rblogs.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)