Update: Slides click here.
Stream processing with R in AWS
by Gergely Daróczi
Abstract: R is rarely mentioned among the big data tools, although it’s fairly well scalable for most data science problems and ETL tasks. This talk presents an open-source R package to interact with Amazon Kinesis via the MultiLangDaemon bundled with the Amazon KCL to start multiple R sessions on a machine or cluster of nodes to process data from theoretically any number of Kinesis shards. Besides the technical background and a quick introduction on how Kinesis works, this talk will feature some stream processing use-cases at CARD.com, and will also provide an overview and hands-on demos on the related data infrastructure built on the top of Docker, Amazon ECS, ECR, KMS, Redshift and a bunch of third-party APIs.
Bio: Gergely Daróczi is an enthusiast R user and package developer, founder of an R-based web application at rapporter.net, Ph.D. candidate in Sociology, Director of Analytics at CARD.com with a strong interest in designing a scalable data platform built on the top of R, AWS and a dozen APIs. He maintains some CRAN packages mainly dealing with reporting and API integrations, co-authored a number of journal articles in social and medical sciences, and recently published the “Mastering Data Analysis with R” book at Packt.
– 6:30pm arrival, food/drinks and networking
– 7:30pm talks
– 9 pm more networking
You must have a confirmed RSVP and please arrive by 7:25pm the latest. Please RSVP here on Eventbrite.
Venue: Edmunds, 2401 Colorado Avenue (this is Edmunds’ NEW LOCATION, don’t go to the old one)
Park underneath the building (Colorado Center), Edmunds will validate.