minio.s3: A MinIO connector package for R

[This article was first published on R – Hi! I am Nagdev, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

MinIO is a high performance, distributed object storage system. It is software-defined, runs on industry standard hardware and is 100% open source under the Apache V2 license[1]. Today, MinIO is deployed globally with over 272.5M+ docker pulls and 18K+ git commits. MinIO is written in “go” language. So, expect it to have fast response. You can read more about this here.

MinIO and Data Science

MinIO has played a pivotal role in data science deployment. Today, we deal with data in various formats such as images, videos, audio clips and other proprietary format objects. Storing this information in traditional databases is quite challenging and does not have high response times for high frequency applications.

The other application of MinIO in data science is storing trained models. Deep learning models usually have bigger file size compared to its counterpart machine learning models which are usually few KB to MB.

MinIO officially supports integration with python, go language, etc. As a heavy R user it can be quite challenging to use MinIO through R. The first solution that came to my mind was to use reticulate to access MinIO through python. Again, this is good for testing and not feasible for deployment into production.

Installing MinIO

MinIO could be installed in few lines of code and is well documented here. MinIO can be deployed on Linux, mac, Windows, and K8’s. for prototyping I would recommend running a stateful docker container. Instructions for running on docker can be found here.

R Package minio.s3

github

MinIO is compatible with Amazon S3 cloud service. So, we can technically use S3 compatible API’s to access MinIO storage. You might be wondering don’t we already have a package for accessing Amazon Web Services (AWS) through R? You are right, R does have a package called aws.s3 that we could use to access AWS which was developed by cloudyR team. I tried using that package and it was quite clunky to access MinIO and not all functions were compatible.

So, my solution was to use their package and tweek quite a lot and could be used for accessing MinIO. The end product was minio.s3 package.

I would like to thank cloudyR team for their initial contributions for this package.

Installation

This package is not yet on CRAN. To install the latest development version you can install from the github:

library(devtools)
install_github("nagdevAmruthnath/minio.s3")

Usage

By default, all packages for AWS/MinIO services allow the use of credentials specified in a number of ways, beginning with:

  1. User-supplied values passed directly to functions.
  2. Environment variables, which can alternatively be set on the command line prior to starting R or via an Renviron.site or .Renviron file, which are used to set environment variables in R during startup (see ? Startup). Or they can be set within R:
    Sys.setenv("AWS_ACCESS_KEY_ID" = "test", # enter your credentials
           "AWS_SECRET_ACCESS_KEY" = "test123", # enter your credentials
           "AWS_DEFAULT_REGION" = "us-east-1",
           "AWS_S3_ENDPOINT" = "192.168.1.1:8085")    # change it to your specific IP and port
    

For more information on aws usage, refer to aws.s3 package.

Code Examples

The package can be used to examine publicly accessible S3 buckets and publicly accessible S3 objects.

library("minio.s3")
bucketlist(add_region = FALSE)

If your credentials are incorrect, this function will return an error. Otherwise, it will return a list of information about the buckets you have access to.

Buckets

Create a bucket

To create a new bucket, simply call

put_bucket('my-bucket', acl = "public-read-write", use_https=F)

If successful, it should return TRUE

List Bucket Contents

To get a listing of all objects in a public bucket, simply call

get_bucket(bucket = 'my-bucket', use_https = F)

Delete Bucket

To delete a bucket, simply call

delete_bucket(bucket = 'my-bucket', use_https = F)

Objects

There are eight main functions that will be useful for working with objects in S3:

  1. s3read_using() provides a generic interface for reading from S3 objects using a user-defined function
  2. s3write_using() provides a generic interface for writing to S3 objects using a user-defined function
  3. get_object() returns a raw vector representation of an S3 object. This might then be parsed in a number of ways, such as rawToChar()xml2::read_xml()jsonlite::fromJSON(), and so forth depending on the file format of the object
  4. save_object() saves an S3 object to a specified local file
  5. put_object() stores a local file into an S3 bucket
  6. s3save() saves one or more in-memory R objects to an .Rdata file in S3 (analogously to save()). s3saveRDS() is an analogue for saveRDS()
  7. s3load() loads one or more objects into memory from an .Rdata file stored in S3 (analogously to load()). s3readRDS() is an analogue for saveRDS()
  8. s3source() sources an R script directly from S3

They behave as you would probably expect:

# save an in-memory R object into S3
s3save(mtcars, bucket = "my_bucket", object = "mtcars.Rdata", use_https = F)

# `load()` R objects from the file
s3load("mtcars.Rdata", bucket = "my_bucket", use_https = F)

# get file as raw vector
get_object("mtcars.Rdata", bucket = "my_bucket", use_https = F)
# alternative 'S3 URI' syntax:
get_object("s3://my_bucket/mtcars.Rdata", use_https = F)

# save file locally
save_object("mtcars.Rdata", file = "mtcars.Rdata", bucket = "my_bucket", use_https = F)

# put local file into S3
put_object(file = "mtcars.Rdata", object = "mtcars2.Rdata", bucket = "my_bucket", use_https = F)

 

Please feel free to email me if you have any questions or comments. If you have any issues in the package, please create an issue on github. Also, check out my github page for other R packages, tutorials and other projects.

To leave a comment for the author, please follow the link and comment on their blog: R – Hi! I am Nagdev.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)