Linking R to iRODSā€™ new HTTP API šŸ„³

[This article was first published on iRODS4R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Note

This article is about the release of rirods 0.2.0 (see the documentation). iRODS C++ HTTP 0.2.0 is a requirement for this R package.

As of November 2023 iRODS has released the iRODS C++ HTTP API, as a replacement of the old iRODS C++ REST API. This new interface to the iRODS server comes with a host of improvements including flexibility of management for all types of operations on users, collections and data objects, as well as supporting authentication with OpenID Connect. The HTTP API is steadily maturing, so it was about time to leverage this functionality with rirods.

Besides a lot of potential for future rirods development in terms of iRODS operations, this new release (rirods 0.2.0) ensures some obvious improvements on the same core functionality as in previous releases of rirods. In this blog we will showcase one of those improvementsā€”the speed of uploading and downloading of files to iRODS. The new HTTP API can now be configured to allow receiving of much larger payloads. An example of how the HTTP API can be configured with a JSON file can be seen here. On top of that, write requests can now be made in parallel, which can as well be configured for the number of allowed threads.

In order to benchmark this improvement we show upload and download time as well as memory usage for iget() and iput(), respectively. We use the Docker based iRODS demo as our server. We can easily stand this server up, as follows:

library(rirods)
use_irods_demo()
Do the following to connect with the iRODS demo server: 
create_irods("http://localhost:9001/irods-http-api/0.2.0") 
iauth("rods", "rods")

For more information on use_irods_demo() see the demo vignette (vignette("demo", package = "rirods")) or this page. Now that the demo server is running we can log in to the system by following the instructions as printed above.

create_irods("http://localhost:9001/irods-http-api/0.2.0") 
iauth("rods", "rods")

This server does obviously not contain any collections or data objects, as can be seen by using ils().

ils()
This collection does not contain any objects or collections.

This demo server has been configured to allow receiving request bodies of up to 8.4 Mb and we can use a maximum of 3 threads simultaneously during a write operation.

The files transferred to the irods demo server are created with the GitHub API and the R package gh. The number of repositories and commits for the GitHub Groups; git, irods, nodejs, and tidyverse have been downloaded (7th March 2024). The downloaded results are then saved as csv files. The files generated in this way range in size from 2 Kb to 10 Mb (FigureĀ 1). The script and files can be found here.

FigureĀ 1: Number of repositories and commits on GitHub for the groups git, irods, nodejs, and tidyverse.

These files are then transferred to the demo server with iput() and subsequently retrieved with iget(), while comparing rirods 0.1.2 and 0.2.0. This is repeated 100 times and performance was monitored with the R package bench. We can see that the newer version shows a drastic improvement for file transfers in both directions (FigureĀ 2). This was to be expected as the older rirods package version needed to chop up the file before transferring it chunk-by-chunk to the server at relatively small file sizes (Kb range). As this all took place in R, this was necessarily slow.

FigureĀ 2: Median time of file transfer for iget() and iput() based on bench.

The same can be said for the memory allocated to up- and download files to iRODS, which improves in the newer version of rirods (FigureĀ 3). We see that above the threshold for file transfers configured server-side imposes a little more strain on the system above 8 Mb, as we also need to chop the file in pieces in these instances.

FigureĀ 3: Memory allocated for file transfer for iget() and iput() based on bench.

Increased write and read operations are an obvious improvement when using rirods 0.2.0 and the iRODS HTTP API has a lot of potential to add even more functionality to rirods in the future. Read more about the changes in some of the functions interfaces in the changelog. Contributions to the package are warmly welcomed. Please report issues and make pull requests in the GitHub repository of rirods.

The following table list all of the benchmark results.

Benchmarks results for iget and iput.
source fun name file name file size (Mb) memory allocated (Mb) median time (sec)
https://cran.r-project.org/src/contrib/rirods_0.1.2 iget git_repos 0.00 0.13 0.11
https://cran.r-project.org/src/contrib/rirods_0.1.2 iget tidyverse_repos 0.01 1.20 0.31
https://cran.r-project.org/src/contrib/rirods_0.1.2 iget irods_repos 0.05 5.01 1.58
https://cran.r-project.org/src/contrib/rirods_0.1.2 iget nodejs_repos 0.05 5.88 1.67
https://cran.r-project.org/src/contrib/rirods_0.1.2 iget tidy_commits 2.88 307.80 80.21
https://cran.r-project.org/src/contrib/rirods_0.1.2 iget git_commits 4.92 526.89 131.62
https://cran.r-project.org/src/contrib/rirods_0.1.2 iget irods_commits 6.52 695.48 180.09
https://cran.r-project.org/src/contrib/rirods_0.1.2 iget nodejs_commits 9.69 1010.00 268.12
https://cran.r-project.org/src/contrib/rirods_0.1.2 iput git_repos 0.00 0.12 0.16
https://cran.r-project.org/src/contrib/rirods_0.1.2 iput tidyverse_repos 0.01 1.18 0.45
https://cran.r-project.org/src/contrib/rirods_0.1.2 iput irods_repos 0.05 5.00 1.96
https://cran.r-project.org/src/contrib/rirods_0.1.2 iput nodejs_repos 0.05 5.87 2.13
https://cran.r-project.org/src/contrib/rirods_0.1.2 iput tidy_commits 2.88 308.06 107.14
https://cran.r-project.org/src/contrib/rirods_0.1.2 iput git_commits 4.92 526.34 176.25
https://cran.r-project.org/src/contrib/rirods_0.1.2 iput irods_commits 6.52 696.10 244.76
https://cran.r-project.org/src/contrib/rirods_0.1.2 iput nodejs_commits 9.69 1010.00 364.56
https://cran.r-project.org/src/contrib/rirods_0.2.0 iget git_repos 0.00 1.10 0.05
https://cran.r-project.org/src/contrib/rirods_0.2.0 iget tidyverse_repos 0.01 1.10 0.11
https://cran.r-project.org/src/contrib/rirods_0.2.0 iget irods_repos 0.05 1.10 0.12
https://cran.r-project.org/src/contrib/rirods_0.2.0 iget nodejs_repos 0.05 1.10 0.11
https://cran.r-project.org/src/contrib/rirods_0.2.0 iget tidyverse_commits 2.88 1.10 0.17
https://cran.r-project.org/src/contrib/rirods_0.2.0 iget git_commits 4.92 1.12 0.24
https://cran.r-project.org/src/contrib/rirods_0.2.0 iget irods_commits 6.52 1.10 0.29
https://cran.r-project.org/src/contrib/rirods_0.2.0 iget nodejs_commits 9.69 1.10 0.42
https://cran.r-project.org/src/contrib/rirods_0.2.0 iput git_repos 0.00 2.17 0.09
https://cran.r-project.org/src/contrib/rirods_0.2.0 iput tidyverse_repos 0.01 2.18 0.10
https://cran.r-project.org/src/contrib/rirods_0.2.0 iput irods_repos 0.05 2.21 0.11
https://cran.r-project.org/src/contrib/rirods_0.2.0 iput nodejs_repos 0.05 2.22 0.09
https://cran.r-project.org/src/contrib/rirods_0.2.0 iput tidyverse_commits 2.88 4.92 0.16
https://cran.r-project.org/src/contrib/rirods_0.2.0 iput git_commits 4.92 7.22 0.17
https://cran.r-project.org/src/contrib/rirods_0.2.0 iput irods_commits 6.52 8.38 0.23
https://cran.r-project.org/src/contrib/rirods_0.2.0 iput nodejs_commits 9.69 752.86 2.92
To leave a comment for the author, please follow the link and comment on their blog: iRODS4R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)