Written by Ben Young, an EARL Boston attendee. http://completelyabsorbed.com/
A week ago I flew to Boston, Massachusetts for EARL 2015. This was my first business trip, and as such I was very excited. The conference, speakers, and attendees did not disappoint. Mango Solutions put on a great schedule of workshops and talks. I would recommend attending future EARLs to anyone using R professionally.
Below is a summary of my time at EARL 2015.
Introduction to Rcpp
Dirk gave a great introduction to Rcpp from the ground up. I am a novice C++ developer, and I was able to follow along from start to finish. I feel inspired to start working with Rcpp myself. Two ideas are (a) write a package facilitating connection with MIDI devices, to enable producing music from R, and (b) a fast poker evaluator based on a fast C++ evaluator.
Interactive Reporting with RMarkdown and Shiny
I taught myself Shiny about a year ago. Several months ago I started experimenting with RMarkdown. Garrett gave an enlightening, in-depth workshop on how to combine the two, with a result that is highly efficient and consistent.
Monday morning I walked from my bed and (not) breakfast about a mile from the NERD center. Cambridge is beautiful, I highly recommend exploring.
R in Market Research – Handling ‘Wide'(not big) Data
Shad Thomas – Glass Box Research
A good approach at answering the question “how do you smartly reduce the amount of data without losing meaning?”
Heuristic Methods for Real World Optimization
Brandon Bass – Altenex LLC
Brandon’s talk has me interested in learning more about (a) Particle Swarm Optimization and (b) Evolutionary Algorithms. There was a funny, very meta moment when Brandon binged and googled ‘What is optimization?’. Being in a Microsoft building, it seemed fitting that Bing gave an excellent answer, and Google’s was off base.
How to do Survival Analysis of Health Data in R
Monika Wahi – DethWench Professional Services
I particularly appreciated Monika’s talk, as I do Survival Analysis as part of my work from time to time. Monika’s quote “I prefer logisitic regression, someone’s either dead or alive, and that’s pretty clear, linear regression is kind of waffley.” Her talk highlighted three approaches to survival analysis – parametric, semi parametric, and non parametric. In Monika’s line of work, semi parametric models are frequently used, specifically Cox model. The non parametric Kaplan Meier is also frequently used.
Predictive Models for Neglected Disease Drug Discovery
Paul Kowalczyk – Syngenta Biotechnology
Paul’s background is in drug design. He led an engaged discussion using Shiny. Paul illustrated techniques in drug development using machine learning techniques such as random forest, SVM, and KNN. I especially appreciated Paul focusing on literate programming – being sure someone can run your code without you in the room, big things don’t need to be explained.
Jared Lander – Lander Analytics
Jared generated some beautiful, enlightenting graphics. My favorite was the visualization of elastic nets with coefficient paths.
Visualization and Sensitivity Analysis of PK/PD Models in R
Yan Li – Celgene
Yan Li gave a compelling talk advocating for a paradigm shift to model based drug development. The methods usually followed now cost billions of dollars from start to finish for a drug. Yan broke down the processes of drug development, addressing issues, innovative solutions, and more.
Sharing Data between R and non R users
Aimee Gott – Mango Solutions
Aimee’s talk was my favorite of the conference. She talked at length about a solution Mango developed, their client wanted their R users to be able to seamlessly collaborate with their Excel users. She also gave a visual tour of how this solution manifested. Overall a very impressive application.
Customizing R Machine Learning to Your Problem with Caret
Marcos Pereira – Millward Brown
Marcos covered the caret package, customizing the summary function, and customizing the caret models. A great exploration of the package.
Creating Rich Analytic Presentations with the RCloud Framework
– Doug Ashton – Mango Solutions
Doug demoed RCloud, a product compared to ipython notebook. To me it looks like the perfect toolbox for implementing finely tuned scripts. I’m currently trying to get RCloud running on a remote machine, though admittedly the process is quite difficult.
Opening Keynote 1
Richard Pugh – Mango Solutions
Richard gave an inspiring keynote, my favorite note of which is that going out and hiring “unicorns” is not reasonable. Richard showed tools that had been made to assist in scoring employees and prospective hirees, and how he used these tools to “build a unicorn.”
Opening Keynote 2
Garrett Grolemund – RStudio
Garrett’s keynote was very interesting. He talked about how his career began as a psychologist, moving on speaking about how the brain processes information, that everything we perceive is inherently flawed. He made a lot of well-placed references to The Matrix. My biggest take away was that a data scientist’s job is to determine what the truth about reality is.
Measuring Brand Ad Effectiveness
Tim Hesterberg – Google
Tim gave a history of consumer surveys, as well as how Google collects, filters, and fits data from their surveys today. Tim kept his talk fresh and interesting by giving a narrative from the side of the sales department, as well as the side of the survey taker.
Performance Attribution for Equity Portfolios
Yang Lu – Hutchin Hill Capital
I spoke with Yang before his talk. He told me he’s been using R since college, that his workplace is very R friendly, and his bosses love R. His talk addresed the question “how do we measure portfolio performance?” Yang’s answer utilized a Brinson model, as well as a regression based approach.
Using R and Bioconductor in Cancer Genetics and Precision Medicine
Aedin Culhane – Dana-Farber Cancer Institute and Harvard TH Chan School of Public Health
Aedin opened her talk by recalling the Horse Manure Crisis of 1894, as an example of shortsighted modeling. Many were panicking about the growing amounts of horse manure, with no end in sight, and the advent of the automobile stopped this problem. Following, Aedin’s talk explored personalized medicine, and genome sequencing (of which R plays a large role, in the Bioconductor libary).
Quantitative Portfolio Management with High Frequency Data
Jerzy Pawlowski – NYU Polytechnic School of Engineering
Jerzy showed some methods of implementing portfolio management, and spent some time discussing Garman-Klass, as well as Rogers-Satchell estimators. A bit of this talk was beyond my current level of understanding.
Garbage In, Garbage Out – Automating Data Quality
Rob described how EarlyWarning was created, for detecting fraud in banks, and wholly owned by 5 major banks.
A declarative DSL for the plotly graphing library in R
Jack Parmer – Plotly
Deploying predictive models as APIs
Sean Lorenz – Domino Data Labs
Sean demoed his company’s product, which allows predictive models to be called via API. This product is not only cloud-based, but also available as an on-premise release.
Predicting Student Success at Scale: APIs and DSLs for Building and Integrating Many Models
Harlan Harris – Educational Advisory Board
Harlan talked about his company, and some principles that ring true : “the data science team does the data science”, and “use tools that you know to build tools that you’ll use”.
Scaling R for Real-world Business Analytics
Roger Fried – Teradata
Roger gave a demo of Teradata’s AsterR, highlighting it’s abilities to easily perform glm-like operations on billions of rows.
EARL 2015 : Boston was a fantastic time, and I learned a lot. I’m motivated to put this new knowledge to work, and plan to post more, interesting posts very soon.
Thank you Ben for your blog post, to see some of Ben’s pictures head over to http://completelyabsorbed.com/