My eRum 2018 biggest highlights

Posted on May 18, 2018 by Peter Laurinec in R bloggers | 0 Comments

[This article was first published on Peter Laurinec, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

On the range of dates 14.-16. May 2018, the European R users meeting (eRum) was held in Budapest. I was there as an active participant since I had the presentation about time series data mining. The eRum 2018 was a very successful event and I want to thank organizers of this event for a great organization of it.

This blog post will be oriented on my biggest highlights of the eRum conference and as a list of useful resources.

Workshops

The eRum started with many workshops separated to 2 blocks and 7 parallel sessions (so together 14 workshops). It was difficult to choose 2 workshops from 14, in which I will sit because there were many interesting topics. I finally chose DALEX and Keras workshops.

DALEX – Descriptive mAchine Learning EXplanations

Great workshop by Przemyslaw Biecek and Mateusz Staniak about tools for exploration, validation, and explanation of complex machine learning models.

Fun on #workshop with DALEX to explain ML models at @erum2018 #erum2018 #rstats #dataviz #DataScience speaker: @smarterpoland pic.twitter.com/hHf9dThcr2
— Peter Laurinec (@petolauri) May 14, 2018

I learned many techniques for a diagnosis of machine learning models. Techniques for explanations of a trained model, predictions, single prediction etc. were all presented here. Workshop resources can be downloaded here:

DALEX_docs.

Various packages were used for these purposes, the list of them follows:

Deep learning with Keras

The second workshop that I attended was about using Keras for deep learning by Aimee Gott and Douglas Ashton. It was a nice workshop about the basic usage of Keras library in R. We had got through the use cases with Iris dataset and time series dataset from accelerometer (used CNN for training). The materials can be downloaded from here:

MangoTheCat/keras-workshop.

Conference talks

The second and the third day of the conference continued with keynote and invited talks, contributed talks and lightning talks. It was really motivating and inspirational to see all the R enthusiasts speak about their projects. It gives me more confidence to contribute to the R ecosystem or in the Data Science ecosystem in general. I will mention briefly 6 talks that were most fascinating to me.

The recipes package by Edwin Thoen helps in preprocessing (creating) of design (model) matrices. By recipes, you can create effective preprocessing “pipeline” for your data.

The bombshell by Florian Privé was about using large matrices in R. He created bigstats package for a parallel and fast manipulation of matrices with a larger size than RAM size.

The great keynote speech by Nathalie Vialaneix was about using unsupervised learning for relational data (or dissimilarity data). She talked about various interesting use cases to use her R packages adjclust and SOMbrero for clustering relational data. The slides can be found here: slides_villavialaneix_ERUM2018.

Unsupervised #learning for relational data, dissimilarities with #rstats packages adjclust and SOMbrero by @Natty_V2 #erum2018 #DataScience #MachineLearning
Great talk! pic.twitter.com/O73rzg7O1z
— Peter Laurinec (@petolauri) May 15, 2018

Afterward, Erin LeDell from H2O talked about automated ensemble learning using h2o package. The h2o.automl function allows various interesting things, for example, limit (restrict) learning time for a creation of ensemble.

The great machine learning session continued with a talk by Szilard Pafka. His benchmark repositories are well known in the ML community. He talked about gradient boosting frameworks (h2o.gbm, xgboost, lightGBM), and their pros and cons (see repo GBM-perf).

The next day was most interesting for me talk by Henrik Bengtsson about parallel computing in R. His future package allows async parallel multiprocessing computing. It has many various useful applications, for example in shiny apps.

TSrepr talk

As I mentioned in the beginning, I also gave a talk about my TSrepr package. I talked about how to use time series representations to do better data mining in R. Slides are here:

Time series representations for better data mining from Peter Laurinec

The video of the talk:

You can read more about how to use time series representation methods in my previous blog posts:

All other talks can be seen on Budapest Users of R Network channel!

To leave a comment for the author, please follow the link and comment on their blog: Peter Laurinec.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

My eRum 2018 biggest highlights

Workshops

DALEX – Descriptive mAchine Learning EXplanations

Deep learning with Keras

Conference talks

TSrepr talk

Related

Workshops

DALEX – Descriptive mAchine Learning EXplanations

Deep learning with Keras

Conference talks

TSrepr talk

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)