useR 2015: Computational

Posted on July 1, 2015 by csgillespie in R bloggers | 0 Comments

[This article was first published on Why? » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

These are my initial notes from useR 2015. I will/may revise when I have time.

Computational Performance; Chair: Dirk Eddelbuettel

Running R+Hadoop using Docker Containers (E. James Harner)

Introduction

Big data architectures:
- HDFS/Hadoop: software framework for distributed storage and distributed processing
- Tachyon/Spark: uses in-memory

Rc2 server (R cloud computing)

Has an editor & output panel. Interactive collaboration (Demo)
highly scalable
4-tier architecture: client, app server, compute cloud (JSON over BSD sockets for R),
databases (pgSQL & couchdb)

RC2 Client

Sharable project and workspaces
Graphs are written to files and moved to the database as blobs
Security: A 3 value token is used for auto-logins

Summary

Rc2 is an accessible IDE for students and data scientist to allow real time collaboration. It also acts as a front end to Hadoop and Spark clusters.

Algorithmic Differentiation for Extremum Estimation: An Introduction Using RcppEigen (Matt P. Dziubinski)

Why

Parametric model: We want to estimate a parameter by maximizing an objective function
No closed formed expressions, so we need to numerically optimize

Algorithms

Derivative free: does not rely on knowledge of the objective function
Gradient-based: needs the gradient of the objective function
- Steepest ascent, newton
- Often exhibit superior convergence rates
- But getting the gradient can be tricky, e.g. finite difference methods

Algorithmic diffentiation

Essentially use the chain rule
Need to recode the objective function in Cpp using Rcpp

Improving computational performance with algorithm engineering (Kirill Müller)

Application: activity based microsimulation models

Weighted sampling without replacement

Random sample: sample.int
Common framework: Subdivide an interval according to probabilities
- If sampling without replacement, remove sub-interval
R uses trivial algorithm with update in O(n)
- Heap-like data structure
Alternative approaches:
- Rejection sampling
- One-pass sampling (Efraimidis and Spirakis, 2006)

Statistical matching (data fusion)

Use Gower's distance to compare distribution
- works with interval, ordinal and nominal variables

Please note that the notes/talks section of this post is merely my notes on the
presentation. I may have made mistakes: these notes are not guaranteed to be
correct. Unless explicitly stated, they represent neither my opinions nor the
opinions of my employers. Any errors you can assume to be mine and not the
speaker’s. I’m happy to correct any errors you may spot – just let me know!

To leave a comment for the author, please follow the link and comment on their blog: Why? » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

useR 2015: Computational

Computational Performance; Chair: Dirk Eddelbuettel