R Functions for Reproducible Data Frames

August 10, 2018
By

(This article was first published on George J. Mount, and kindly contributed to R-bloggers)

 

While there are many great resources to get help in R, sometimes you just need a second opinion. Here is where the many Internet help boards come in handy, most notably Stack Overflow.

Start posting on Stack Overflow and you will soon learn the importance of the minimum reproducible example (MRE). Without one, you will likely even be refused “service.”

So, what is an MWE? It is fairly self-descriptive — the smallest possible example that contains all the information necessary (in this case, for someone to help you with your code). Here’s a great walkthrough on the topic written specifically for R coding (fittingly posted to Stack Overflow).

In this example we are focusing on setting up a minimally reproducible data set, in our case a data frame. The above post suggests to use R’s built-in data frames to build an MWE, which is a great idea — in fact it negates the need for what we are going to do, which is sampling from these built-in data frames.

Regardless, I  want to point out a cool alternative to build a minimally reproducible data frame in R. We will do this using four R functions: dput and get, then dump and source.

 

Dput and Dget

Let’s take the first five rows of the iris dataset. Using dput we will write the data frame iris5 to an ASCII text representation. You could then paste this code (that starts with structure()) into a help forum, and your responder can in turn assign this output to an object (I assigned mine to irisme.).

#for exampole - get first 5 rows of iris dataset

iris5 <- head(iris, 5)

#write to an ASCII text representation 

dput(iris5)

#paste it back and assign to new object      

irismre <- structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5), Sepal.Width = c(3.5, 
                                                                                   3, 3.2, 3.1, 3.6), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4), 
                          Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2), Species = structure(c(1L, 
                                                                                          1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
                                                                                          ), class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", 
                                                                                                                            "Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 
                                                                                                                                                                                     5L), class = "data.frame")


irismre

If your dataset is big your dput output might get pretty big. Of course, try to keep your minimally reproducible dataset small — that is the reason you are doing an MWE!

Rather than getting the ASCII text representation, you could save this information to an R object instead with the “file =” argument in dput. Then read it back with dget:

#or you can write to a file

dput(iris5, file = "C:/RFiles/iris5.R")


#and read it back
irismre <- dget("C:/RFiles/iris5.R")

Dump and Source

In the above example we re-assigned the data frames to objects of our own choosing. With dump and source, R will save and load the object by their original names. So, in our example we save the file as the object name “iris5,” and when we load it back with source and list the objects in our environment with ls(), we will see iris5 again, even after removing it from our environment with rm().

#or use dump and source to keep the object same name

x <- dump("iris5", file = "C:/RFiles/data.R")
rm(iris5)

source("C:/RFiles/data.R")
ls()

Complete code below:

To leave a comment for the author, please follow the link and comment on their blog: George J. Mount.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)