Cloudy with a chance of Caffeinated Query Orchestration – New rJava Wrappers for AWS Athena SDK for Java

February 22, 2019
By

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

There are two fledgling rJava-based R packages that enable working with the AWS SDK for Athena:

They’re both needed to conform with the way CRAN like rJava-based packages submitted that also have large JAR dependencies. The goal is to eventually have wrappers for anything R folks need under the AWS Java SDK menu.

All package pairs will eventually cohabitate under the Cloudy R Project once each gets to 90% API coverage, passes CRAN checks and has passing Travis checks.

One thing I did get working right up front was the asynchronous dplyr chain query execution collect_async(), so if you need that and would rather not use reticulated wrappers, now’s your chance.

You would be correct in assuming this is an offshoot of the recent work on updating metis. My primary impetus for this is to remove the reticulate dependency from our Dockerized production setups but I also have discovered I like the Java libraries more than the boto3-based ones (not really a shocker there if you know my views on Python). As a result I should be able to quickly wrap most any library you may need (see below).

FIN

The next major wrapper coming is S3 (there are bits of it implemented in awsathena now but that’s temporary) and — for now — you can toss a comment here or file an issue in any of the social coding sites you like for priority wrapping of other AWS Java SDK libraries. Also, if you want some experience working with rJava packages in a judgement-free zone, drop a note into these or any new AWS rJava-based package repos and I’ll gladly walk you through your first PR.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)