Last Friday the Cologne R user group came together for the 15th time. Since its inception over three years ago the group evolved from a small gathering in a pub into an active data science community, covering wider topics than just R. Still, R is the link and clue between the different interests. Last Friday’s agenda was a good example of this, with three talks touching on workflow management, web development and risk analysis.
R in a big data pipeline
luigiinto a heterogeneous workflow of different applications. This is especially useful when R needs to be integrated with hadoop/hdfs based technologies, such as Spark and Hive. Luigi is not unlike Make, which Kirill presented at our last meeting in June. In a configuration file Yuki specified the various workflow steps and dependencies between the jobs.
Kicking off the luigi script starts the workflow, and
luigidserver allows Yuki to monitor the various parts of the dependency graph visually. Thus, he can see the progress of his workflow in real time and identify quickly, when and where a sub process fails. As Yuki pointed out, this becomes critical in production systems, where failures need to be known and fixed quickly, unlike when ones carries out an explorative analysis in a development/research environment. See also Yuki’s blog post for further details.
Shiny + Shinyjs
|Download presentation files|
Paul showed us an example of a shinyapp that depending on the user plotted a different graph. Behind the scene his script would either hide or shows those plots, conditioned on the user. With only a few lines in R it allowed him to develop a user specific application. To achieve this he created a login screen that checks for user name and password. In his example he had hard coded the login credentials, instead of using a secure connection via a professional shiny server instance. However this was sufficient for his purpose, where he tests how students react to different economic scenarios in a lab environment at university.
Experience vs. Data
I presented some Bayesian ideas to analyse risks with little data. I used the wonderful “Hit and run accident” example from Daniel Kahneman’s book Thinking, fast and slow to explain Bayes’ formula, introduced Bayesian belief networks for a claims analysis and discussed the challenge of predicting events when they haven’t happened yet (also in Stan). Along the way I mentioned a few ideas on communicating risk, which I learned from David Spiegelhalter earlier this year.
Next Kölner R meetingThe next meeting will be scheduled in December. Details will be published on our Meetup site. Thanks again to Revolution Analytics/Microsoft for their sponsorship.
Please get in touch, if you would like to present at the next meeting.