AzureStor: an R package for working with Azure storage

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Hong Ooi, senior data scientist, Microsoft Azure

A few weeks ago, I introduced the AzureR family of packages for working with Azure in R. Since then, I’ve also written articles on how to use AzureRMR to interact with Azure Resource Manager, how to use AzureVM to manage virtual machines, and how to use AzureContainers to deploy R functions with Azure Kubernetes Service. This article is the next in the series, and covers AzureStor: an interface to Azure storage.

The Resource Manager interface: creating and deleting storage accounts

AzureStor implements an interface to Azure Resource Manager, which you can use manage storage accounts: creating them, retrieving them, deleting them, and so forth. This is done via the appropriate methods of the az_resource_group class. For example, the following code shows how you might create a new storage account from scratch.

library(AzureStor)

# get the resource group for the storage account
rg <- AzureRMR::az_rm$
    new(tenant="{tenant_id}", app="{app_id}", password="{password}")$
    get_subscription("{subscription_id}")$
    get_resource_group("myresourcegroup")

# create the storage account
# by default, this will be in the resource group's region
rg$create_storage_account("mynewstorage")

Without any options, this will create a storage account with the following parameters:

  • General purpose account (all storage types supported)
  • Locally redundant storage (LRS) replication
  • Hot access tier (for blob storage)
  • HTTPS connection required for access

You can change these by setting the arguments to create_storage_account(). For example, to create an account with geo-redundant storage replication and the default blob access tier set to “cool”:

rg$create_storage_account("myotherstorage",
    replication="Standard_GRS",
    access_tier="cool")

To retrieve an existing storage account, use the get_storage_account() method. Only the storage account name is required.

# retrieve one of the accounts created above
stor2 <- rg$get_storage_account("myotherstorage")

Finally, to delete a storage account, you simply call its delete() method. Alternatively, you can call the delete_storage_account() method of the az_resource_group class, which will do the same thing. In both cases, AzureStor will prompt you for confirmation that you really want to delete the storage account.

rg$delete_storage_account("mynewstorage")
stor2$delete() # if you have the storage account object

The client interface: working with storage

Storage endpoints

Perhaps the more relevant part of AzureStor for most users is its client interface to storage. With this, you can upload and download files and blobs, create containers and shares, list files, and so on. Unlike the ARM interface, the client interface uses S3 classes. This is for a couple of reasons: it is more familiar to most R users, and it is consistent with most other data manipulation packages in R, in particular the tidyverse.

The starting point for client access is the storage_endpoint object, which stores information about the endpoint of a storage account: the URL that you use to access storage, along with any authentication information needed. The easiest way to obtain an endpoint object is via the storage account resource object’s get_blob_endpoint() and get_file_endpoint() methods:

# get the storage account object
stor <- AzureRMR::az_rm$
    new(tenant="{tenant_id}", app="{app_id}", password="{password}")$
    get_subscription("{subscription_id}")$
    get_resource_group("myresourcegroup")$
    get_storage_account("mynewstorage")

stor$get_blob_endpoint()
# Azure blob storage endpoint
# URL: https://mynewstorage.blob.core.windows.net/
# Access key: <hidden>
# Account shared access signature: <none supplied>
# Storage API version: 2018-03-28

stor$get_file_endpoint()
# Azure file storage endpoint
# URL: https://mynewstorage.file.core.windows.net/
# Access key: <hidden>
# Account shared access signature: <none supplied>
# Storage API version: 2018-03-28

This shows that the base URL to access blob storage is https://mynewstorage.blob.core.windows.net/, while that for file storage is https://mynewstorage.file.core.windows.net/. While it’s not displayed, the endpoint objects also include the access key necessary for authenticated access to storage; this is obtained directly from the storage account resource.

More practically, you will usually want to work with a storage endpoint without having to go through the process of authenticating with Azure Resource Manager (ARM). Often, you may not have any ARM credentials to start with. In this case, you can create the endpoint object directly with blob_endpoint() and file_endpoint():

# same as above
blob_endp <- blob_endpoint(
    "https://mynewstorage.blob.core.windows.net/",
    key="mystorageaccesskey")
file_endp <- file_endpoint(
    "https://mynewstorage.file.core.windows.net/",
    key="mystorageaccesskey")

Notice that when creating the endpoint this way, you have to provide the access key explicitly.

Instead of an access key, you can provide a shared access signature (SAS) to gain authenticated access. The main difference between using a key and a SAS is that the former unlocks access to the entire storage account. A user who has a key can access all containers and files, and can read, modify and delete data without restriction. On the other hand, a user with a SAS can be limited to have access only to specific files, or be limited to read access, or only for a given span of time, and so on. This is usually much better in terms of security.

Usually, the SAS will be given to you by your system administrator. However, if you have the storage acccount resource object, you can generate and use a SAS as follows. Note that generating a SAS requires the storage account’s access key.

# shared access signature: read/write access, container+object access, valid for 12 hours
now <- Sys.time()
sas <- stor$get_account_sas(permissions="rw",
    resource_types="co",
    start=now,
    end=now + 12 * 60 * 60,
    key=stor$list_keys()[1])

# create an endpoint object with a SAS, but without an access key
blob_endp <- stor$get_blob_endpoint(sas=sas)

If you don’t have a key or a SAS, you will only have access to unauthenticated (public) containers and file shares.

Container and object access: blob containers, file shares, blobs, files

Given an endpoint object, AzureStor provides the following methods for working with containers:

  • blob_container: get an existing blob container
  • create_blob_container: create a new blob container
  • delete_blob_container: delete a blob container
  • list_blob_containers: return a list of blob container objects
  • file_share: get an existing file share
  • create_file_share: create a new file share
  • delete_file_share: delete a file share
  • list_file_shares: return a list of file share objects

Here is some example blob container code showing their use. The file share code is similar.

# an existing container
cont <- blob_container(blob_endp, "mycontainer")

# create a new container and allow
# unauthenticated (public) access to blobs
newcont <- create_blob_container(blob_endp, "mynewcontainer",
    public_access="blob")

# delete the container
delete_blob_container(newcont)

# piping also works
library(magrittr)
blob_endp %>% 
    blob_container("mycontainer")

As a convenience, instead of providing an endpoint object and a container name, you can also provide the full URL to the container. If you do this, you’ll also have to supply any necessary authentication details such as the access key or SAS.

cont <- blob_container(
    "https://mynewstorage.blob.core.windows.net/mycontainer",
    key="mystorageaccountkey")

share <- file_share(
    "https://mynewstorage.file.core.windows.net/myshare",
    key="mystorageaccountkey")

Given a blob container or file share object, use the list_blobs() and list_azure_files() functions to list the storage objects they contain. The “azure” in list_azure_files is to avoid any confusion with R’s regular list.files function.

# list blobs inside a blob container
list_blobs(cont)
#      Name       Last-Modified Content-Length
# 1  fs.txt 2018-10-13 11:34:30            132
# 2 fs2.txt 2018-10-13 11:04:36         731930

# if you want only the filenames
list_blobs(cont, info="name")
# [1] "fs.txt"  "fs2.txt"

# and for files inside a file share
list_azure_files(share, "/")
#       name type   size
# 1 100k.txt File 100000
# 2   fs.txt File    132

To transfer files and blobs, use the following functions:

  • upload_blob/download_blob: transfer a file to or from a blob container.
  • upload_azure_file/download_azure_file: transfer a file to or from a file share.
  • upload_to_url: upload a file to a destination given by a URL. This dispatches to either upload_blob or upload_azure_file as appropriate.
  • download_from_url: download a file from a source given by a URL, the opposite of upload_from_url. This is analogous to base R’s download.file but with authentication built in.
# upload a file to a blob container
blob_endp <- blob_endpoint(
    "https://mynewstorage.blob.core.windows.net/",
    key="mystorageaccesskey")
cont <- blob_container(blob_endp, "mycontainer")
upload_blob(cont, src="myfile", dest="myblob")

# again, piping works
blob_endpoint(
    "https://mynewstorage.blob.core.windows.net/",
    key="mystorageaccesskey") %>%
    blob_container("mycontainer") %>% 
    upload_blob("myfile", "myblob")

# download a blob, overwriting any existing destination file
download_blob(cont, "myblob", "myfile", overwrite=TRUE)

# as a convenience, you can download directly from an Azure URL
download_from_url(
    "https://mynewstorage.blob.core.windows.net/mycontainer/myblob",
    "myfile",
    key="mystorageaccesskey",
    overwrite=TRUE)

File shares have the additional feature of supporting directories. To create and delete directories, use create_azure_dir() and delete_azure_dir():

list_azure_files(share, "/")
#       name type   size
# 1 100k.txt File 100000
# 2   fs.txt File    132

# create a directory under the root of the file share
create_azure_dir(share, "newdir")

# confirm that the directory has been created
list_azure_files(share, "/")
#       name      type   size
# 1 100k.txt      File 100000
# 2   fs.txt      File    132
# 3   newdir Directory     NA

# delete the directory
delete_azure_dir(share, "newdir")

The AzureStor package is available now on Github.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)