Random sampling of files

[This article was first published on R Code – Geekcologist , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

random_files <- function(path, percent_number, pattern = "wav$|WAV$"){
  ####################################################################
  # path = path to folder with files to select                                 
  #                                                                            
  # percent_number = percentage or number of recordings to select. If value is 
  #   between 0 and 1 percentage of files is assumed, if value greater than 1, 
  #   number of files is assumed                                               
  #                                                                            
  # pattern = file extension to select. By default it selects wav files. For   
  #   other type of files replace wav and WAV by the desired extension         
  ####################################################################
  
  # Get file list with full path and file names
  files <- list.files(path, full.names = TRUE, pattern = pattern)
  file_names <- list.files(path, pattern = pattern)
  
  # Select the desired % or number of file by simple random sampling 
  randomize <- sample(seq(files))
  files2analyse <- files[randomize]
  names2analyse <- file_names[randomize]
  if(percent_number <= 1){
    size <- floor(percent_number * length(files))
  }else{
    size <- percent_number
  }
  files2analyse <- files2analyse[(1:size)]
  names2analyse <- names2analyse[(1:size)]

  # Create folder to output
  results_folder <- paste0(path, '/selected')
  dir.create(results_folder, recursive=TRUE)
  
  # Write csv with file names
  write.table(names2analyse, file = paste0(results_folder, "/selected_files.csv"),
              col.names = "Files", row.names = FALSE)
  
  # Move files
  for(i in seq(files2analyse)){
    file.rename(from = files2analyse[i], to = paste0(results_folder, "/", names2analyse[i]) )
  }
}

 

I normally use this function inside a little script for some extra functionalities:

  1. first I set up the environment by sourcing the required functions and loading the packages,
  2. as I always do when using functions that use randomness, I set a seed to be able to reproduce my results in a later time,
  3. as the function uses a folder path, I’ve included a little search window with tcltk to select the folder instead of having to write the full path by hand.
# Load packages and functions
require(tcltk)
source("random_files.R")

# Set seed to reproduce results
set.seed(1001)

# Select folder with recordings
path <- tcltk::tk_choose.dir()

# Percentage or number of recordings
percent_number <- 0.2 # using percentage

# Random sampling of files
random_files(path, percent_number, pattern = "wav$|WAV$")

This function was written for a specific purpose but with some tweaks you can probably adapt it for other purposes other than the one I use it for.

You can find this and other R scripts at: https://github.com/bmsasilva/Rscripts

 

To leave a comment for the author, please follow the link and comment on their blog: R Code – Geekcologist .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)