Challenge alert — material identification

[This article was first published on Stack Exchange Stats Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We start yet another series of post — challenge alerts. This series is intended to share news about machine learning or data mining challenges which may be interesting to the members of our community, possibly with some brief introduction to the problem. So if you hear about some contest, notify us on Skewed distribution.

Today about the recent event on TunedIt, where FIND Technologies Inc. asks to develop a method to distinguish various materials based on the passive electromagnetic signals the produce. The supply the participants with 3000 1500-sample time series, each corresponding to a measurement of the electric potential on a surface of one of three materials. Here is a plot of one of given time series:

Sample signal

Sample signal, zoomed on the right panel.

Half of this set is annotated with a material class and given as a training set, the rest is a test set on which classes must be predicted. This is a `rolling’ challenge, i.e. participants can send many predictions at any time and their results on a preliminary test set (different from a test set used to finally assess their accuracy) are instantly published. Unfortunately, organizers have chosen the preliminary set out of train set samples, so overfitted submissions can get arbitrary high accuracy on the leaderboard. In fact this has happened already, so the real progress remains unknown. After registration, one can download a preliminary report which reveals some technical details about the problem. It also claims that one can obtain circa 70% accuracy in separating each pair of those classes using linear learner on wavelet spectra.

The main downside of the challenge is that is quite frequently regarded as a scam, especially because there is no way of trying to replicate the results from preliminary raport (and the method described therein fails on the challenge data) — more details can be found on the challenge thread on TunedIt forum. Anyway no-one has broke the first, 50% accuracy milestone till now.

The upside is that there are prizes; 1k Canadian $  for breaking 50, 60, 70, 80 and 90% milestone and 40k C$ for braking final goal of 95% accuracy and transferring intelectual rights to FIND.

So, good luck — or have a nice time doing more productive things (=

To leave a comment for the author, please follow the link and comment on their blog: Stack Exchange Stats Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)