suRprise! – Classifying Kinder Eggs by Boosting

[This article was first published on Theory meets practice..., and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Carrying the Danish tradition of Juleforsøg to the realm of statistics, we use R to classify the figure content of Kinder Eggs using boosted regression trees for the egg’s weight and possible rattling noises.

Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a GNU General Public License (GPL v3) license from github.


A juleforsøg is the kind of exploding experiment happening in the last physics or chemistry class before the Christmas vacation. Frequently the teacher, with a look of secrecy, initializes the class by locking the door mumbling something like “the headmaster better not see this…”. With Christmas approaching fast, here is an attempt to create a statistical juleforsøg concluding the Theory meets practice 2016 posting season:

The advertisement campaign of the Kinder Surprise Eggs aka. Kinder Joy claims that the content of every 7th egg is a figure (see example) – otherwise they contain toys or puzzles, which positively can be described as junk. Figures, in particular completed series, on the other hand, can achieve high trading values. The clear goal is thus to optimize your egg hunting strategy in order to maximize figure content.

The problem

Your budget is limited, so the question is which egg to select when standing in the supermarket?

Various egg selection strategies

It goes without saying that brute force purchasing strategies would be insane. Hence, a number of egg selection strategies can be observed in real life:

  • The no clue egg enthusiast: Selects an egg at random. With a certain probability (determined by the producer and the cleverness of the previous supermarked visitors) the egg contains a figure

  • The egg junkie: knows a good radiologist

  • The egg nerd: using scale, rattling noises and the barcode he/she quickly determines whether there is a figure in the egg

We shall in this post be interested in the statistician’s egg selection approach: Egg classification based on weight and rattling noise using ‘top-notch’ machine learning algorithms – in our case based on boosted classification trees.

Data Collection

We collected n=79 eggs of which 43.0% were figures – the data are available under a GPL v3.0 license from github. For each egg, we determined its weight as well as the sound it produced when being shaken. If the sounds could be characterized as rattling (aka. clattering) this was indicative of the content consisting of many parts and, hence, unlikely to be a figure.

Altogether, the first couple of rows of the dataset look as follows.

head(surprise, n=5)
##   weight rattles_like_figure figure rattles rattles_fac figure_fac
## 1     32                   1      0       0          no         no
## 2     34                   0      1       1         yes        yes
## 3     34                   1      1       0          no        yes
## 4     30                   1      0       0          no         no
## 5     34                   1      1       0          no        yes

Descriptive Data Analysis

The fraction of figures in the dataset was 34/79, which is way higher than the proclaimed 1/7; possibly, because professionals egg collectors were at work…

Of the 79 analysed eggs, 54 were categorized as non-rattling. The probability of such a non-rattling egg really containing a figure was 51.9%. This proportion is not impressive, but could be due to the data collector’s having a different understanding of exactly how the variable rattling was to be interpreted: Does it rattle, or does it rattle like a figure? In hindsight, a clearer definition and communication of this variable would have prevented ambiguity in the collection.

A descriptive plot of the weight distribution of eggs with and without figure content shows, that eggs with figures tend to be slightly heavier:

Below the proportion (in %) of eggs with figure content per observed weight:

tabw <- with(surprise, table(weight, figure_fac))
tabw <- tabw / rowSums(tabw)
##           weight
## figure_fac    26    28    29    30    31    32    33    34    35    36    40
##        no  100.0  50.0  66.7  53.3  71.4  72.7  75.0  25.0 100.0  33.3   0.0
##        yes   0.0  50.0  33.3  46.7  28.6  27.3  25.0  75.0   0.0  66.7 100.0

A simple selection rule based on weight would be to weigh eggs until you hit a 40g egg. A slightly less certain stopping rule would be to pick 34g eggs. However, modern statistics is more than counting and analysing proportions!

Machine Learning the Egg Content

We use machine learning algorithms to solve the binary classification problem at hand. In particular, we use the caret package (Kuhn 2016) and classify figure content using boosted regression trees as implemented in the xgboost package (Chen et al. 2016). Details on how to use the caret package can, e.g., be found in the following tutorial.


##Grid with xgboost hyperparameters
xgb_hyperparam_grid = expand.grid(
  nrounds = c(25, 100, 1000),
  eta = c(0.01, 0.001, 0.0001),
  max_depth = c(2, 4, 8, 10, 16, 32),
  gamma = 1,
  colsample_bytree = 0.8, min_child_weight = 1, subsample = 0.5
##caret training control object
control <- trainControl(method="repeatedcv", number=8, repeats=8, classProbs=TRUE,
                        summaryFunction = twoClassSummary, allowParallel=TRUE)
##train away and do it parallelized...
registerDoMC(cores = 3)
m_xgb <- train( figure_fac ~ weight * rattles_fac, data=surprise, method="xgbTree",
               trControl=control, verbose=FALSE, metric="ROC", tuneGrid=xgb_hyperparam_grid)
##look at the result
## eXtreme Gradient Boosting  
##  79 samples 
##   2 predictor 
##   2 classes: 'no', 'yes'  
##  No pre-processing 
##  Resampling: Cross-Validated (8 fold, repeated 8 times)  
##  Summary of sample sizes: 69, 70, 69, 69, 69, 68, ...  
##  Resampling results across tuning parameters: 
##    eta    max_depth  nrounds  ROC        Sens       Spec      
##    1e-04   2           25     0.6507292  0.8661458  0.4359375 
##    1e-04   2          100     0.6706510  0.8661458  0.4398437 
##    1e-04   2         1000     0.6777865  0.8661458  0.4398437 
##    ...  ...        ... 
##    1e-02  32          100     0.6788802  0.8479167  0.4367187 
##    1e-02  32         1000     0.6573828  0.7395833  0.4304688 
##  Tuning parameter 'gamma' was held constant at a value of 1 
##  Tuning 
##   parameter 'min_child_weight' was held constant at a value of 1 
##  Tuning 
##   parameter 'subsample' was held constant at a value of 0.5 
##  ROC was used to select the optimal model using  the largest value. 
##  The final values used for the model were nrounds = 100, max_depth = 2, eta = 0.01, 
##   gamma = 1, colsample_bytree = 0.8, min_child_weight = 1 and subsample = 0.5.

The average AUC for the 64 resamples is 0.68. Average sensitivity and specificity are 86.6% and 43.3%, respectively. This shows that predicting figure content with the available data is better than simply picking an egg at random, but no figure-guaranteeing strategy appears possible on a per-egg basis.

Predicting the Content of a Particular Egg

Suppose the egg you look at weighs 36g and when shaken it sounds like a lot of small parts being moved. In other words:

predict(m_xgb, newdata = data.frame(weight=36, rattles_fac="yes"),type="prob")
##          no       yes
## 1 0.4695328 0.5304672

Hence, despite the rattling noises, the classifier is of the opinion that it's slightly more likely than 50% that there is a figure inside. However, when we opened this particular egg:

...a car. Definitely not a figure! The proof of concept disappointment was, however, quickly counteracted by the surrounding chocolate...

As a standard operating procedure for your optimized future supermarket hunt, below are shown the classifier's predicted probabilities for figure content as a function of egg weight and the rattles_fac variable.


The above post only discusses the optimal selection on a per-egg basis. One could weight & shake several eggs and then select the one with the highest predicted probability for containing a figure. Future research is needed to solve this sequential decision making problem in an optimal way.


We have retained a validation sample of 10 eggs and are willing to send an unconsumed 11th element of the sample to whoever obtains the best score on this validation sample. Anyone who knows how to upload this to kaggle?

We wish all readers God jul and a happy new year!


Thanks to former colleagues at the Department of Statistics, University of Munich, as well as numerous statistics students in Munich and Stockholm, for contributing to the data collection. In particular we thank Alexander Jerak for his idea of optimizing figure hunting in a data driven way more than 10 years ago.


Chen, Tianqi, Tong He, Michael Benesty, Vadim Khotilovich, and Yuan Tang. 2016. Xgboost: Extreme Gradient Boosting.

Kuhn, Max. 2016. Caret: Classification and Regression Training.

To leave a comment for the author, please follow the link and comment on their blog: Theory meets practice.... offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)