R Functions for Reproducible Data Frames

Posted on August 10, 2018 by George Mount in R bloggers | 0 Comments

[This article was first published on George J. Mount, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While there are many great resources to get help in R, sometimes you just need a second opinion. Here is where the many Internet help boards come in handy, most notably Stack Overflow.

Start posting on Stack Overflow and you will soon learn the importance of the minimum reproducible example (MRE). Without one, you will likely even be refused “service.”

So, what is an MWE? It is fairly self-descriptive — the smallest possible example that contains all the information necessary (in this case, for someone to help you with your code). Here’s a great walkthrough on the topic written specifically for R coding (fittingly posted to Stack Overflow).

In this example we are focusing on setting up a minimally reproducible data set, in our case a data frame. The above post suggests to use R’s built-in data frames to build an MWE, which is a great idea — in fact it negates the need for what we are going to do, which is sampling from these built-in data frames.

Regardless, I want to point out a cool alternative to build a minimally reproducible data frame in R. We will do this using four R functions: dput and get, then dump and source.

Dput and Dget

Let’s take the first five rows of the iris dataset. Using dput we will write the data frame iris5 to an ASCII text representation. You could then paste this code (that starts with structure()) into a help forum, and your responder can in turn assign this output to an object (I assigned mine to irisme.).

#for exampole - get first 5 rows of iris dataset

iris5 <- head(iris, 5)

#write to an ASCII text representation 

dput(iris5)

#paste it back and assign to new object      

irismre <- structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5), Sepal.Width = c(3.5, 
                                                                                   3, 3.2, 3.1, 3.6), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4), 
                          Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2), Species = structure(c(1L, 
                                                                                          1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
                                                                                          ), class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", 
                                                                                                                            "Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 
                                                                                                                                                                                     5L), class = "data.frame")


irismre

If your dataset is big your dput output might get pretty big. Of course, try to keep your minimally reproducible dataset small — that is the reason you are doing an MWE!

Rather than getting the ASCII text representation, you could save this information to an R object instead with the “file =” argument in dput. Then read it back with dget:

#or you can write to a file

dput(iris5, file = "C:/RFiles/iris5.R")


#and read it back
irismre <- dget("C:/RFiles/iris5.R")

Dump and Source

In the above example we re-assigned the data frames to objects of our own choosing. With dump and source, R will save and load the object by their original names. So, in our example we save the file as the object name “iris5,” and when we load it back with source and list the objects in our environment with ls(), we will see iris5 again, even after removing it from our environment with rm().

#or use dump and source to keep the object same name

x <- dump("iris5", file = "C:/RFiles/data.R")
rm(iris5)

source("C:/RFiles/data.R")
ls()

Complete code below:

To leave a comment for the author, please follow the link and comment on their blog: George J. Mount.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

R Functions for Reproducible Data Frames

Dput and Dget

Dump and Source

Related

Dput and Dget

Dump and Source

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)