Only Load Data If Not Already Open in R

September 12, 2013
By

(This article was first published on Mollie's Research Blog, and kindly contributed to R-bloggers)

I often find it beneficial to check to see whether or not a dataset is already loaded into R at the beginning of a file. This is particularly helpful when I'm dealing with a large file that I don't want to load repeatedly, and when I might be using the same dataset with multiple R scripts or re-running the same script while making changes to the code.

To check to see if an object with that name is already loaded, we can use the exists function from the base package. We can then wrap our read.csv command with an if statement to cause the file to only load if an object with that name is not already loaded.


if(!exists("largeData")) {
largeData <- read.csv("huge-file.csv",
header = TRUE)
}

You will probably also find it useful to use the "colClasses" option of read.csv or read.table to help the file load faster and make sure your data are in the right format. For example:


if(!exists("largeData")) {
largeData <- read.csv("huge-file.csv",
header = TRUE,
colClasses = c("factor", "integer", "character", "integer",
"integer", "character"))
}


--
This post is one part of my series on dealing with large datasets.

To leave a comment for the author, please follow the link and comment on his blog: Mollie's Research Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.