R⁶ — Using R With Amazon Athena & AWS Temporary Security Credentials

May 16, 2017
By

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

Most of the examples of working with most of the AWS services show basic username & password authentication. That’s all well-and-good, but many shops use the AWS Security Token Service to provide temporary credentials and session tokens to limit exposure and provide more uniform multi-factor authentication. At my workplace, Frank Mitchell created a nice electron app to make it super easy to create and re-up these credentials. The downside of this is that all AWS service usage for work requires using these credentials and I was having the darndest time trying to get Athena’s JDBC driver working with it (but I wasn’t spending alot of time on it as I tend to mirror research data to a local, beefy Apache Drill server).

I finally noticed the

com.amazonaws.athena.jdbc.shaded.com.amazonaws.auth.EnvironmentVariableCredentialsProvider

class and decided to give the following a go (you will need to point fil to wherever you have the Athena jar file):

library(RJDBC)
library(tidyverse)

fil <- "~/Drivers/AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

aws <- ini::read.ini("~/.aws/credentials")

Sys.setenv(AWS_ACCESS_KEY_ID = aws[Sys.getenv("AWS_PROFILE")][[1]]$aws_access_key_id)
Sys.setenv(AWS_SECRET_ACCESS_KEY = aws[Sys.getenv("AWS_PROFILE")][[1]]$aws_secret_access_key)
Sys.setenv(AWS_SESSION_TOKEN = aws[Sys.getenv("AWS_PROFILE")][[1]]$aws_session_token)

provider <- "com.amazonaws.athena.jdbc.shaded.com.amazonaws.auth.EnvironmentVariableCredentialsProvider"

con <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                 s3_staging_dir=Sys.getenv("AWS_S3_STAGING_DIR"),
                 schema_name="DEFAULT_DB_SCHEMA_NAME",
                 aws_credentials_provider_class=provider)

dbListTables(con)

dbListFields(con, "SOME_TABLE_IN_THE_DEFAULT_DB")

dbGetQuery(con, "SELECT * FROM DEFAULT_DB_SCHEMA_NAME.SOME_TABLE_IN_THE_DEFAULT_DB limit 10;")

YMMV on Windows (comments about what does and does not work on Windows are welcome).

The provider line was the key element I was missing prior to last night.

The Awsaml utility monitors/maintains entries under it’s purview credentials file and keeps consistent profile ids, so I keep that AWS_PROFILE setting in my ~/.Renviron.

I also keep the default S3 Athena data staging bucket in an environment variable as well.

If you provide a default schema_name then you can list tables and fields but queries need fully qualified database (Amazon calls them “schemas”) dot table name.

Initial attempts to have this setup “just work” with dplyr 0.6.0 (the forthcoming EPIC release) were unsuccessful but I’ll poke at all this when I get time and likely write a small Athena package to help smooth over rougher areas.

Y’all likely figured all this out way before I did, but in the event someone else is looking for the information, it should be google-able now.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)