Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Use {r2lambda} to download Tidytuesday dataset
In this exercise, we’ll create an AWS Lambda function that downloads the tidytuesday data set for the most recent Tuesday (or most recent Tuesday from a date of interest).
Required packages
library(r2lambda) library(jsonlite) library(magrittr)
Runtime function
The first step is to write the runtime function. This is the function that will be
executed when we invoke the Lambda function after it has been deployed. To download
the Tidytuesday data set, we will use the {tidytuesdayR} package. In the runtime
script, we define a function called tidytyesday_lambda that takes one optional
argument date. If date is omitted, the function returns the data set(s) for the most
recent Tuesday, otherwise, it looks up the most recent Tuesday from a date of interest
and returns the corresponding data set(s).
library(tidytuesdayR)
tidytuesday_lambda <- function(date = NULL) {
if (is.null(date))
date <- Sys.Date()
most_recent_tuesday <- tidytuesdayR::last_tuesday(date = date)
tt_data <- tidytuesdayR::tt_load(x = most_recent_tuesday)
data_names <- names(tt_data)
data_list <- lapply(data_names, function(x) tt_data[[x]])
return(data_list)
}
tidytuesday_lambda("2022-02-02")
R script to build the lambda
To build the lambda image, we need an R script that sources any required code,
loads any needed libraries, defines a runtime function, and ends with a call to
lambdr::start_lambda(). The runtime function does not have to be defined in this
file. We could, for example, source another script, or load a package and set a
loaded function as the runtime function in the subsequent call to r2lambda::build_lambda
(see below). We save this script to a file and record the path:
r_code <- "
library(tidytuesdayR)
tidytuesday_lambda <- function(date = NULL) {
if (is.null(date))
date <- Sys.Date()
most_recent_tuesday <- tidytuesdayR::last_tuesday(date = date)
tt_data <- tidytuesdayR::tt_load(x = most_recent_tuesday)
data_names <- names(tt_data)
data_list <- lapply(data_names, function(x) tt_data[[x]])
return(data_list)
}
lambdr::start_lambda()
"
tmpfile <- tempfile(pattern = "ttlambda_", fileext = ".R")
write(x = r_code, file = tmpfile)
Build, test, and deploy the lambda function
1. Build
-
We set the
runtime_functionargument to the name of the function we wish thedockercontainer to run when invoked. In this case, this istidytuesday_lambda. This adds aCMDinstruction to theDockerfile -
We set the
runtime_pathargument to the path we stored the script defining our runtime function. -
We set the
dependenciesargument toc("tidytuesdayR")because we need to have thetidytuesdayRpackage installed within thedockercontainer if we are to download the dataset. This steps adds aRUNinstruction to theDockerfilethat callsinstall.packagesto install{tidytuesdayR}from CRAN. -
Finally, the
tagargument sets the name of our Lambda function which we’ll use later to test and invoke the function. Thetagargument also becomes the name of the folder that{r2lambda}will create to build the image. This folder will have two files,Dockerfileandruntime.R.runtime.Ris our script fromruntime_path, renamed before it is copied in thedockerimage with aCOPYinstruction.
runtime_function <- "tidytuesday_lambda" runtime_path <- tmpfile dependencies <- "tidytuesdayR" r2lambda::build_lambda( tag = "tidytuesday3", runtime_function = runtime_function, runtime_path = runtime_path, dependencies = dependencies )
2. Test
To make sure our Lambda docker container works as intended, we start it locally,
and invoke it to test the response. The response is a list of three elements:
response <- r2lambda::test_lambda(tag = "tidytuesday3", payload = list(date = Sys.Date()))
status, should be 0 if the test worked,stdout, the standard output stream of the invocation, andstderr, the standard error stream of the invocation
stdout and stderr are raw vectors that we need to parse, for example:
rawToChar(response$stdout)
If the stdout slot of the response returns the correct output of our function,
we are good to deploy to AWS.
3. Deploy
The deployment step is simple, in that all we need to do is specify the name (tag) of
the Lambda function we wish to push to AWS ECR. The deploy_lambda function also
accepts ..., which are named arguments ultimately passed onto
paws.compute:::lambda_create_function. This is the function that calls the Lambda
API. To see all available arguments run ?paws.compute:::lambda_create_function.
The most important arguments are probably Timeout and MemorySize, which set
the time our function will be allowed to run and the amount of memory it will have
available. In many cases it will make sense to increase the defaults of 3 seconds
and 128 mb.
r2lambda::deploy_lambda(tag = "tidytuesday3", Timeout = 30)
4. Invoke
If all goes well, our function should now be available on the cloud awaiting requests.
We can invoke it from R using invoke_lambda. The arguments are:
function_name– the name of the functioninvocation_type– typicallyRequestResponseinclude_log– whether to print the logs of the run on the consolepayload– a named list with arguments sent to theruntime_function. In this case, the runtime function,tidytuesday_lambdahas a single argumentdate, so the corresponding list islist(date = Sys.Date()). As our function can be called without any argument, we can also send an empty list as the payload.
response <- r2lambda::invoke_lambda( function_name = "tidytuesday3", invocation_type = "RequestResponse", payload = list(), include_logs = TRUE )
Just like in the local test, the response payload comes as a raw vector that needs to be parsed into a data.frame:
tidytuesday_dataset <- response$Payload %>% rawToChar() %>% jsonlite::fromJSON(simplifyDataFrame = TRUE) tidytuesday_dataset[[1]][1:5, 1:5]
Summary
In this post, we went over some details about:
- how to prepare an
Rscript before deploying it as a Lambda function, - what are the roles of several of the key arguments,
- how to request longer timeout or more memory for a Lambda function, and
- how to parse the response payload returned by the Lambda function
Stay tuned for a follow-up post where we set this Lambda function to run on a weekly schedule!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
