Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As data scientists we often have to deal with lots of tedious tasks. One such tedious task can be interacting with the file system on our computer or the remote machine we’re working with. Thankfully, the {fs} package has a bunch of convenvience function that make our life a whole lot easier.
Let’s check out a few examples. And if videos are more your thing, you can also watch the video version of this blog post on YouTube.
< section id="assemble-paths" class="level2">Assemble paths
Check out this data set.
library(tidyverse)
library(fs)
original_tib <- tibble(
dir = c('some/path/blub', 'bla/here/', 'direct/'),
file_names = c('file_a.csv', 'file_b.csv', 'file_c.txt')
)
original_tib
## # A tibble: 3 × 2
## dir file_names
## <chr> <chr>
## 1 some/path/blub file_a.csv
## 2 bla/here/ file_b.csv
## 3 direct/ file_c.txt
Here, assembling a path in the form directory/file_name.ext can be tricky. Some directories have trailing / and some don’t. So, working with paste0() or glue::glue() would be challenging. Thankfully, the path() function from the {fs} package doesn’t care whether trailing / are there or not.
original_tib |> mutate(path = path(dir, file_names)) ## # A tibble: 3 × 3 ## dir file_names path ## <chr> <chr> <fs::path> ## 1 some/path/blub file_a.csv some/path/blub/file_a.csv ## 2 bla/here/ file_b.csv bla/here/file_b.csv ## 3 direct/ file_c.txt direct/file_c.txt
Remove and set extensions
We can even modify file extensions really easily. That’s convenient when we want to take input from csv-files and then turn the data into images using the same file names.
original_tib |>
mutate(
path = path(dir, file_names),
out_path = path_ext_set(path, 'png')
)
## # A tibble: 3 × 4
## dir file_names path out_path
## <chr> <chr> <fs::path> <fs::path>
## 1 some/path/blub file_a.csv some/path/blub/file_a.csv some/path/blub/file_a.png
## 2 bla/here/ file_b.csv bla/here/file_b.csv bla/here/file_b.png
## 3 direct/ file_c.txt direct/file_c.txt direct/file_c.png
Get directory infos
You can get information on a directory as a tree in the console. Here, I’m using a directory called raw-input inside my working directory to demonstrate that.
dir_tree('raw-input')
## raw-input
## ├── a
## │ └── dat.csv
## ├── b
## │ └── dat.csv
## └── c
## └── dat.csv
You can also get lots of information on these files.
dir_info('raw-input')
## # A tibble: 3 × 18
## path type size permissions modification_time user group device_id
## <fs::path> <fct> <fs:> <fs::perms> <dttm> <chr> <chr> <dbl>
## 1 raw-input/a direc… 4K rwxrwxr-x 2025-03-29 09:02:24 albe… albe… 66307
## 2 raw-input/b direc… 4K rwxrwxr-x 2025-03-29 09:04:33 albe… albe… 66307
## 3 raw-input/c direc… 4K rwxrwxr-x 2025-03-29 09:04:35 albe… albe… 66307
## # ℹ 10 more variables: hard_links <dbl>, special_device_id <dbl>, inode <dbl>,
## # block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
## # access_time <dttm>, change_time <dttm>, birth_time <dttm>
But in a lot of cases, it will probably suffice to just get the file paths.
dir_ls('raw-input')
## raw-input/a raw-input/b raw-input/c
In this function, you’ll need to use recurse = TRUE, though, to go into nested structures.
dir_ls('raw-input', recurse = TRUE)
## raw-input/a raw-input/a/dat.csv raw-input/b raw-input/b/dat.csv
## raw-input/c raw-input/c/dat.csv
Iterate over file paths
Usually, you don’t want to stop after finding the desired paths. You usually want to iterate over them. For this, you can save the output of dir_ls() into a vector and iterate through it using the map() or walk() function. Here, the function I use inside of walk() will
- load the data using the specified path,
- create a ggplot from it, and
- save the image.
The tricky thing here is that I do want to save the files in an output directory. It is supposed to have the same structure as the raw-input directory. That’s why I also need to create the necessary paths and directories for that inside the function.
csv_files <- dir_ls(
'raw-input',
recurse = TRUE,
regexp = '\\.csv$'
)
csv_files |>
walk(
\(file_path) {
plt <- read_csv(file_path) |>
ggplot(aes(col_a, col_b)) +
geom_point(size = 10, col = 'dodgerblue4')
out_path <- file_path |>
path_ext_set('.png') |>
str_replace('^raw-input', 'output')
dir_create(path_dir(out_path))
ggsave(filename = out_path)
}
)
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image
## Rows: 3 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): col_a, col_b, col_c
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Saving 6 x 4 in image
Splendid. This should have worked and you can now see the output directory and the plots in the file tree.
dir_tree() ## . ## ├── index.qmd ## ├── index.rmarkdown ## ├── output ## │ ├── a ## │ │ └── dat.png ## │ ├── b ## │ │ └── dat.png ## │ └── c ## │ └── dat.png ## └── raw-input ## ├── a ## │ └── dat.csv ## ├── b ## │ └── dat.csv ## └── c ## └── dat.csv
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
