This year’s conference was a great opportunity to catch up with friends and meet new people who are eager to talk about their work with R. We enjoyed the sessions we attended and thought that the presenters were entertaining as well as interesting.
On the second day stream 1 was packed because of two very interesting speakers. Jerome Durussel from Catapult showed us how difficult it is to detect dives by a goalkeeper. His random forests did the job in the end and the results were presented with some cool animations. We wonder how well his algorithms work on strikers.
Jerome was followed up by a much anticipated presentation by Tim Paulden. Tim’s talk explored several methods for predicting the result of a football match via mathematical modelling. He began by using a very simple model to predict the result between the London rivals Arsenal and Fulham from several seasons ago. The model was based on the mean number of goals scored by both teams over the (then) current season. Using this model, Arsenal, with a mean of 2.1, far exceeded Fulham’s meagre 1.1, and so were the hot favourites to win the match. The model was clearly oversimplified so Tim ploughed on and introduced several other, more complex, models that were much more accurate. We found Tim’s talk to be a real insight into how data science can be applied to the sporting world; we just can’t wait to hear him again next year.
Stream 1 ended with a really interesting talk from John Burns Murdoch about how the FT are using ggplot2 to quickly test out possible graphics for publication. What amazed us most was that they are producing around 50 graphics a day for publication so you can immediately see the benefit of working with ggplot2. The ease of the layering system makes it so simple to prototype graphics and its eye-opening to hear one of the most well-known producers of visualisations, the Financial Times, are making the most of it.
More generally we were also looking forward to checking out talks on how to deploy your analysis into production in a developer friendly way. Louis Vine’s experiences with data scientists talking to developers at Funding Circle were a valuable lesson in how not to “throw it over the wall”. Louis had to strike a balance between demanding software best practice and giving the data scientists the freedom to try new things. Ben Downe at BCA had solved the problem a different way using AzureML’s built in web API services. Again, this allowed a data scientist to provide fully documented, production ready APIs, that you can simply hand over to the development team. Last but not least the always brilliant Vincent Warmerdam updated us on the latest advancements with H20 on Spark (Sparkling Water) and how this can be used to generate a .jar file that, once again, can be handed over neatly to a development team, in a format they understand, ready for deployment.
All presentations can be viewed here. We hope you enjoyed this year’s conference as much as we did and hope to see you again next year.