Statistics Sunday: Reading and Creating a Data Frame with Multiple Text Files

November 18, 2018
By

(This article was first published on Deeply Trivial, and kindly contributed to R-bloggers)

First Statistics Sunday in far too long! It’s going to be a short one, but it describes a great trick I learned recently while completing a time study for our exams at work.

To give a bit of background, this time study involves analzying time examinees spent on their exam and whether they were able to complete all items. We’ve done time studies in the past to select time allowed for each exam, but we revisit on a cycle to make certain the time allowed is still ample. All of our exams are computer-administered, and we receive daily downloads from our exam provider with data on all exams administered that day.

What that means is, to study a year’s worth of exam data, I need to read in and analyze 365(ish – test centers are generally closed for holidays) text files. Fortunately, I found code that would read all files in a particular folder and bind them into a single data frame. First, I’ll set the working directory to the location of those files, and create a list of all files in that directory:

setwd(“Q:/ExamData/2018”)
filelist <- list.files()

For the next part, I’ll need the data.table library, which you’ll want to install if you don’t already have it:

library(data.table)
Exams2018 <- rbindlist(sapply(filelist, fread, simplify = FALSE, use.names = TRUE, idcol = “FileName”)

Now I have a data frame with all exam data from 2018, and an additional column that identifies which file a particular case came from.

What if your working directory has more files than you want to read? You can still use this code, with some updates. For instance, if you want only the text files from the working directory, you could add a regular expression to the list.files() code to only look for files with “.txt” extension:

list.files(pattern = “\\.txt$”)

If you’re only working with a handful of files, you can also manually create the list to be used in the rbindlist function. Like this:

filelist <- c(“file1.txt”, “file2.txt”, “file3.txt”)

That’s all for now! Hope everyone has a Happy Thanksgiving!

To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)