A case for the assign() function

[This article was first published on R on Thomas' adventuRe, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In R, assign() is one of those functions that common wisdom says you shouldn’t be using. My aim in this blog post is to convince you that assign() can be very handy.

The pharmaceutical industry, which I work in, is still SAS dominated so my primary data source at work are .sas7bdat files. Thus, whenever I use R the first thing I have to do is read in those files.

Since the files have standard names, e.g. ADAE (Analysis Datasets Adverse Events), I want to read them into the global environment with exactly these names.

For a single file that’s easy. Just give the variable the same name as the file.

adae <- haven::read_sas("data/adae.sas7bdat")

If you have a directory with multiple files in it this becomes tedious, though. Let’s simulate this by creating a couple of .csv files with random numbers in it.

dir <- tempdir()
datasets <- c("adsl.csv", "adae.csv", "adrs.csv", "adtte.csv")
for (dataset in datasets) {
  data <- matrix(rnorm(100), nrow = 10)
  write.csv(data, file = file.path(dir, dataset))
}

With the data ready the first step is to get a list of all files. Note that I purposefully set full.names = FALSE.

(files <- list.files(dir, pattern = "csv$", full.names = FALSE))
## [1] "adae.csv"  "adrs.csv"  "adsl.csv"  "adtte.csv"

Next, to read in all those files I loop over each file and

  • remove the extension from file
  • construct the full path to the file with file.path()
  • read in the .csv file and assign it to its name.
for (file in files) {
  file_name <- tools::file_path_sans_ext(file)
  full_path_to_file <- file.path(dir, file)
  assign(file_name, read.csv(full_path_to_file), envir = .GlobalEnv)
}

Note that envir = .GlobalEnv is redundant here but I like to be explicit.
Let’s make sure that this actually worked as expected.

ls()
##  [1] "adae"              "adrs"              "adsl"             
##  [4] "adtte"             "data"              "dataset"          
##  [7] "datasets"          "dir"               "file"             
## [10] "file_name"         "files"             "full_path_to_file"

Indeed, now there are five new variables in the global environment that have the names of the files created earlier.

Without using assign() you’d end up putting all datasets in a list.

data <- lapply(files, function(file) {
  read.csv(file.path(dir, file))
})

That may not be so bad but actually this list doesn’t have names which is a problem.

names(data)
## NULL

I hope this convinced you that assign() is a useful function.

Did you ever use assign()? I’d love to know in the comments.

To leave a comment for the author, please follow the link and comment on their blog: R on Thomas' adventuRe.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)