Read a lot of datasets at once with R

July 25, 2016
By

(This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers)

I often have to read a lot of datasets at once using R. So I’ve wrote the following function to solve this issue:

read_list <- function(list_of_datasets, read_func){

        read_and_assign <- function(dataset, read_func){
                dataset_name <- as.name(dataset)
                dataset_name <- read_func(dataset)
        }

        # invisible is used to suppress the unneeded output
        output <- invisible(
                sapply(list_of_datasets,
                           read_and_assign, read_func = read_func, simplify = FALSE, USE.NAMES = TRUE))

        # Remove the extension at the end of the data set names
        names_of_datasets <- c(unlist(strsplit(list_of_datasets, "[.]"))[c(T, F)])
        names(output) <- names_of_datasets
        return(output)
}

You need to supply a list of datasets as well as the function to read the datasets to read_list. So for example to read in .csv files, you could use read.csv() (or read_csv() from the readr package, which I prefer to use), or read_dta() from the package haven for STATA files, and so on.

Now imagine you have some data in your working directory. First start by saving the name of the datasets in a variable:

data_files <- list.files(pattern = ".csv")

print(data_files)
## [1] "data_1.csv" "data_2.csv" "data_3.csv"

Now you can read all the data sets and save them in a list with read_list():

library("readr")
library("tibble")

list_of_data_sets <- read_list(data_files, read_csv)


glimpse(list_of_data_sets)
## List of 3
##  $ data_1:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..$ col1: chr [1:19] "0,018930679" "0,8748013128" "0,1025635934" "0,6246140983" ...
##   ..$ col2: chr [1:19] "0,0377725807" "0,5959457638" "0,4429121533" "0,558387159" ...
##   ..$ col3: chr [1:19] "0,6241767189" "0,031324594" "0,2238059868" "0,2773350732" ...
##  $ data_2:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..$ col1: chr [1:19] "0,9098418493" "0,1127788509" "0,5818891392" "0,1011773532" ...
##   ..$ col2: chr [1:19] "0,7455905887" "0,4015039612" "0,6625796605" "0,029955339" ...
##   ..$ col3: chr [1:19] "0,327232932" "0,2784035673" "0,8092386735" "0,1216045306" ...
##  $ data_3:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..$ col1: chr [1:19] "0,9236124896" "0,6303271761" "0,6413583054" "0,5573887416" ...
##   ..$ col2: chr [1:19] "0,2114708388" "0,6984538266" "0,0469865249" "0,9271510226" ...
##   ..$ col3: chr [1:19] "0,4941919971" "0,7391538511" "0,3876723797" "0,2815014394" ...

If you prefer not to have the datasets in a list, but rather import them into the global environment, you can change the above function like so:

read_list <- function(list_of_datasets, read_func){

        read_and_assign <- function(dataset, read_func){
                assign(dataset, read_func(dataset), envir = .GlobalEnv)
        }

        # invisible is used to suppress the unneeded output
        output <- invisible(
                sapply(list_of_datasets,
                           read_and_assign, read_func = read_func, simplify = FALSE, USE.NAMES = TRUE))

}

But I personnally don’t like this second option, but I put it here for completeness.

To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)