Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I often find it beneficial to check to see whether or not a dataset is already loaded into R at the beginning of a file. This is particularly helpful when I’m dealing with a large file that I don’t want to load repeatedly, and when I might be using the same dataset with multiple R scripts or re-running the same script while making changes to the code.

To check to see if an object with that name is already loaded, we can use the exists function from the base package. We can then wrap our read.csv command with an if statement to cause the file to only load if an object with that name is not already loaded.

if(!exists("largeData")) {
}


You will probably also find it useful to use the "colClasses" option of read.csv or read.table to help the file load faster and make sure your data are in the right format. For example:

if(!exists("largeData")) {
colClasses = c("factor", "integer", "character", "integer",
"integer", "character"))
}


--
This post is one part of my series on dealing with large datasets.