Does money buy happiness after all? Machine Learning with One Rule

[This article was first published on Shirin's playgRound, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This week, I am exploring Holger K. von Jouanne-Diedrich’s OneR package for machine learning. I am running an example analysis on world happiness data and compare the results with other machine learning models (decision trees, random forest, gradient boosting trees and neural nets).

Conclusions

All in all, based on this example, I would confirm that OneR models do indeed produce sufficiently accurate models for setting a good baseline. OneR was definitely faster than random forest, gradient boosting and neural nets. However, the latter were more complex models and included cross-validation.

If you prefer an easy to understand model that is very simple, OneR is a very good way to go. You could also use it as a starting point for developing more complex models with improved accuracy.

When looking at feature importance across models, the feature OneR chose – Economy/GDP per capita – was confirmed by random forest, gradient boosting trees and neural networks as being the most important feature. This is in itself an interesting conclusion! Of course, this correlation does not tell us that there is a direct causal relationship between money and happiness, but we can say that a country’s economy is the best individual predictor for how happy people tend to be.


OneR

OneR has been developed for the purpose of creating machine learning models that are easy to interpret and understand, while still being as accurate as possible. It is based on the one rule classification algorithm from Holte (1993), which is basically a decision tree cut at the first level.

R.C. Holte (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning. 11:63-91.

While the original algorithm has difficulties in handling missing values and numeric data, the package provides enhanced functionality to handle those cases better, e.g. introducing a separate class for NA values and the optbin() function to find optimal splitting points for each feature. The main function of the package is OneR, which finds an optimal split for each feature and only use the most important feature with highest training accuracy for classification.

I installed the latest stable version of the OneR package from CRAN.

<span class="n">library</span><span class="p">(</span><span class="n">OneR</span><span class="p">)</span><span class="w">
</span>

The dataset

I am using the World Happiness Report 2016 from Kaggle.

<span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">

</span><span class="n">data_16</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read.table</span><span class="p">(</span><span class="s2">"world-happiness/2016.csv"</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">","</span><span class="p">,</span><span class="w"> </span><span class="n">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">data_15</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read.table</span><span class="p">(</span><span class="s2">"world-happiness/2015.csv"</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">","</span><span class="p">,</span><span class="w"> </span><span class="n">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span>

In the 2016 data there are upper and lower CI for the happiness score given, while in the 2015 data we have standard errors. Because I want to combine data from the two years, I am using only columns that are in both datasets.

<span class="n">common_feats</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">data_16</span><span class="p">)[</span><span class="n">which</span><span class="p">(</span><span class="n">colnames</span><span class="p">(</span><span class="n">data_16</span><span class="p">)</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">data_15</span><span class="p">))]</span><span class="w">

</span><span class="c1"># features and response variable for modeling
</span><span class="n">feats</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">setdiff</span><span class="p">(</span><span class="n">common_feats</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Country"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Happiness.Rank"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Happiness.Score"</span><span class="p">))</span><span class="w">
</span><span class="n">response</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"Happiness.Score"</span><span class="w">

</span><span class="c1"># combine data from 2015 and 2016
</span><span class="n">data_15_16</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">select</span><span class="p">(</span><span class="n">data_15</span><span class="p">,</span><span class="w"> </span><span class="n">one_of</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">feats</span><span class="p">,</span><span class="w"> </span><span class="n">response</span><span class="p">))),</span><span class="w">
              </span><span class="n">select</span><span class="p">(</span><span class="n">data_16</span><span class="p">,</span><span class="w"> </span><span class="n">one_of</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">feats</span><span class="p">,</span><span class="w"> </span><span class="n">response</span><span class="p">))))</span><span class="w">
</span>

The response variable happiness score is on a numeric scale. OneR could also perform regression but here, I want to compare classification tasks. For classifying happiness, I create three bins for low, medium and high values of the happiness score. In order to not having to deal with unbalanced data, I am using the bin() function from OneR with method = "content". For plotting the cut-points, I am extracting the numbers from the default level names.

<span class="n">data_15_16</span><span class="o">$</span><span class="n">Happiness.Score.l</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bin</span><span class="p">(</span><span class="n">data_15_16</span><span class="o">$</span><span class="n">Happiness.Score</span><span class="p">,</span><span class="w"> </span><span class="n">nbins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"content"</span><span class="p">)</span><span class="w">

</span><span class="n">intervals</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="n">levels</span><span class="p">(</span><span class="n">data_15_16</span><span class="o">$</span><span class="n">Happiness.Score.l</span><span class="p">),</span><span class="w"> </span><span class="n">collapse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" "</span><span class="p">)</span><span class="w">
</span><span class="n">intervals</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"\\(|]"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">intervals</span><span class="p">)</span><span class="w">
</span><span class="n">intervals</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">","</span><span class="p">,</span><span class="w"> </span><span class="s2">" "</span><span class="p">,</span><span class="w"> </span><span class="n">intervals</span><span class="p">)</span><span class="w">
</span><span class="n">intervals</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">strsplit</span><span class="p">(</span><span class="n">intervals</span><span class="p">,</span><span class="w"> </span><span class="s2">" "</span><span class="p">)[[</span><span class="m">1</span><span class="p">]]))</span><span class="w">
</span>
<span class="n">data_15_16</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_density</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Happiness.Score</span><span class="p">),</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"blue"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"blue"</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.4</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_vline</span><span class="p">(</span><span class="n">xintercept</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">intervals</span><span class="p">[</span><span class="m">2</span><span class="p">])</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_vline</span><span class="p">(</span><span class="n">xintercept</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">intervals</span><span class="p">[</span><span class="m">3</span><span class="p">])</span><span class="w">
</span>

Now I am removing the original happiness score column from the data for modeling and rename the factor levels of the response variable.

<span class="n">data_15_16</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">data_15_16</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">Happiness.Score</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">Happiness.Score.l</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">plyr</span><span class="o">::</span><span class="n">revalue</span><span class="p">(</span><span class="n">Happiness.Score.l</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"(2.83,4.79]"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"low"</span><span class="p">,</span><span class="w"> </span><span class="s2">"(4.79,5.89]"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"medium"</span><span class="p">,</span><span class="w"> </span><span class="s2">"(5.89,7.59]"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"high"</span><span class="p">)))</span><span class="w">
</span>

Because there are only 9 features in this small dataset, I want to explore them all individually before modeling. First, I am plotting the only categorical variable: Region.

This plots shows that there are a few regions with very strong biases in happiness: People in Western Europe, Australia, New Zealand, North America, Latin American and the Caribbean tend to me in the high happiness group, while people in sub-saharan Africa and Southern Asia tend to be the least happiest.

<span class="n">data_15_16</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Region</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Happiness.Score.l</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_bar</span><span class="p">(</span><span class="n">position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dodge"</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.7</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme</span><span class="p">(</span><span class="n">axis.text.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">angle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">45</span><span class="p">,</span><span class="w"> </span><span class="n">vjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w">
          </span><span class="n">plot.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unit</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1.5</span><span class="p">),</span><span class="w"> </span><span class="s2">"cm"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">scale_fill_brewer</span><span class="p">(</span><span class="n">palette</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Set1"</span><span class="p">)</span><span class="w">
</span>

The remaining quantitative variables show happiness biases to varying degrees: e.g. low health and life expectancy is strongly biased towards low happiness, economic factors, family and freedom show a bias in the same direction, albeit not as strong.

<span class="n">data_15_16</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">gather</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">Economy..GDP.per.Capita.</span><span class="o">:</span><span class="n">Dystopia.Residual</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Happiness.Score.l</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.7</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free"</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">scale_fill_brewer</span><span class="p">(</span><span class="n">palette</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Set1"</span><span class="p">)</span><span class="w">
</span>

While OneR could also handle categorical data, in this example, I only want to consider the quantitative features to show the differences between OneR and other machine learning algorithms.

<span class="n">data_15_16</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">data_15_16</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">Region</span><span class="p">)</span><span class="w">
</span>

Modeling

The algorithms I will compare to OneR will be run via the caret package.

<span class="c1"># configure multicore
</span><span class="n">library</span><span class="p">(</span><span class="n">doParallel</span><span class="p">)</span><span class="w">
</span><span class="n">cl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">makeCluster</span><span class="p">(</span><span class="n">detectCores</span><span class="p">())</span><span class="w">
</span><span class="n">registerDoParallel</span><span class="p">(</span><span class="n">cl</span><span class="p">)</span><span class="w">

</span><span class="n">library</span><span class="p">(</span><span class="n">caret</span><span class="p">)</span><span class="w">
</span>

I will also use caret’s createDataPartition() function to partition the data into training (70%) and test sets (30%).

<span class="n">set.seed</span><span class="p">(</span><span class="m">42</span><span class="p">)</span><span class="w">
</span><span class="n">index</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">createDataPartition</span><span class="p">(</span><span class="n">data_15_16</span><span class="o">$</span><span class="n">Happiness.Score.l</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.7</span><span class="p">,</span><span class="w"> </span><span class="n">list</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">train_data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data_15_16</span><span class="p">[</span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="w">
</span><span class="n">test_data</span><span class="w">  </span><span class="o"><-</span><span class="w"> </span><span class="n">data_15_16</span><span class="p">[</span><span class="o">-</span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="w">
</span>

OneR

OneR only accepts categorical features. Because we have numerical features, we need to convert them to factors by splitting them into appropriate bins. While the original OneR algorithm splits the values into ever smaller factors, this has been changed in this R-implementation with the argument of preventing overfitting. We can either split the data into pre-defined numbers of buckets (by length, content or cluster) or we can use the optbin() function to obtain the optimal number of factors from pairwise logistic regression or information gain.

<span class="c1"># default method length
</span><span class="n">data_1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bin</span><span class="p">(</span><span class="n">train_data</span><span class="p">,</span><span class="w"> </span><span class="n">nbins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"length"</span><span class="p">)</span><span class="w">

</span><span class="c1"># method content
</span><span class="n">data_2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bin</span><span class="p">(</span><span class="n">train_data</span><span class="p">,</span><span class="w"> </span><span class="n">nbins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"content"</span><span class="p">)</span><span class="w">

</span><span class="c1"># method cluster
</span><span class="n">data_3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bin</span><span class="p">(</span><span class="n">train_data</span><span class="p">,</span><span class="w"> </span><span class="n">nbins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"cluster"</span><span class="p">)</span><span class="w">

</span><span class="c1"># optimal bin number logistic regression
</span><span class="n">data_4</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">optbin</span><span class="p">(</span><span class="n">formula</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Happiness.Score.l</span><span class="w"> </span><span class="o">~</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">train_data</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"logreg"</span><span class="p">)</span><span class="w">

</span><span class="c1"># optimal bin number information gain
</span><span class="n">data_5</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">optbin</span><span class="p">(</span><span class="n">formula</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Happiness.Score.l</span><span class="w"> </span><span class="o">~</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">train_data</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"infogain"</span><span class="p">)</span><span class="w">
</span>

This is how the data looks like following discretization:

  • Default method

  • 5 bins with method = "content

  • 3 bins with method = "cluster

  • optimal bin number according to logistic regression

  • optimal bin number according to information gain

Model building

Now I am running the OneR models. During model building, the chosen attribute/feature with highest accuracy along with the top 7 features decision rules and accuracies are printed. Unfortunately, this information is not saved in the model object; this would have been nice in order to compare the importance of features across models later on.

Here, all five models achieved highest prediction accuracy with the feature Economy GDP per capita.

<span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"data_"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">))</span><span class="w">
  </span><span class="n">print</span><span class="p">(</span><span class="n">model</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">OneR</span><span class="p">(</span><span class="n">formula</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Happiness.Score.l</span><span class="w"> </span><span class="o">~</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
  </span><span class="n">assign</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"model_"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">),</span><span class="w"> </span><span class="n">model</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>
## 
##     Attribute                     Accuracy
## 1 * Economy..GDP.per.Capita.      63.96%  
## 2   Health..Life.Expectancy.      59.91%  
## 3   Family                        57.21%  
## 4   Dystopia.Residual             51.8%   
## 5   Freedom                       49.55%  
## 6   Trust..Government.Corruption. 45.5%   
## 7   Generosity                    41.89%  
## ---
## Chosen attribute due to accuracy
## and ties method (if applicable): '*'
## 
## 
## Call:
## OneR(data = data, formula = Happiness.Score.l ~ ., verbose = TRUE)
## 
## Rules:
## If Economy..GDP.per.Capita. = (-0.00182,0.365] then Happiness.Score.l = low
## If Economy..GDP.per.Capita. = (0.365,0.73]     then Happiness.Score.l = low
## If Economy..GDP.per.Capita. = (0.73,1.09]      then Happiness.Score.l = medium
## If Economy..GDP.per.Capita. = (1.09,1.46]      then Happiness.Score.l = high
## If Economy..GDP.per.Capita. = (1.46,1.83]      then Happiness.Score.l = high
## 
## Accuracy:
## 142 of 222 instances classified correctly (63.96%)
## 
## 
##     Attribute                     Accuracy
## 1 * Economy..GDP.per.Capita.      64.41%  
## 2   Health..Life.Expectancy.      60.81%  
## 3   Family                        59.91%  
## 4   Trust..Government.Corruption. 55.41%  
## 5   Freedom                       53.15%  
## 5   Dystopia.Residual             53.15%  
## 7   Generosity                    41.44%  
## ---
## Chosen attribute due to accuracy
## and ties method (if applicable): '*'
## 
## 
## Call:
## OneR(data = data, formula = Happiness.Score.l ~ ., verbose = TRUE)
## 
## Rules:
## If Economy..GDP.per.Capita. = (-0.00182,0.548] then Happiness.Score.l = low
## If Economy..GDP.per.Capita. = (0.548,0.877]    then Happiness.Score.l = low
## If Economy..GDP.per.Capita. = (0.877,1.06]     then Happiness.Score.l = medium
## If Economy..GDP.per.Capita. = (1.06,1.28]      then Happiness.Score.l = medium
## If Economy..GDP.per.Capita. = (1.28,1.83]      then Happiness.Score.l = high
## 
## Accuracy:
## 143 of 222 instances classified correctly (64.41%)
## 
## 
##     Attribute                     Accuracy
## 1 * Economy..GDP.per.Capita.      63.51%  
## 2   Health..Life.Expectancy.      62.16%  
## 3   Family                        54.5%   
## 4   Freedom                       50.45%  
## 4   Dystopia.Residual             50.45%  
## 6   Trust..Government.Corruption. 43.24%  
## 7   Generosity                    36.49%  
## ---
## Chosen attribute due to accuracy
## and ties method (if applicable): '*'
## 
## 
## Call:
## OneR(data = data, formula = Happiness.Score.l ~ ., verbose = TRUE)
## 
## Rules:
## If Economy..GDP.per.Capita. = (-0.00182,0.602] then Happiness.Score.l = low
## If Economy..GDP.per.Capita. = (0.602,1.1]      then Happiness.Score.l = medium
## If Economy..GDP.per.Capita. = (1.1,1.83]       then Happiness.Score.l = high
## 
## Accuracy:
## 141 of 222 instances classified correctly (63.51%)
## 
## 
##     Attribute                     Accuracy
## 1 * Economy..GDP.per.Capita.      63.96%  
## 2   Health..Life.Expectancy.      62.16%  
## 3   Family                        58.56%  
## 4   Freedom                       51.35%  
## 5   Dystopia.Residual             50.9%   
## 6   Trust..Government.Corruption. 46.4%   
## 7   Generosity                    40.09%  
## ---
## Chosen attribute due to accuracy
## and ties method (if applicable): '*'
## 
## 
## Call:
## OneR(data = data, formula = Happiness.Score.l ~ ., verbose = TRUE)
## 
## Rules:
## If Economy..GDP.per.Capita. = (-0.00182,0.754] then Happiness.Score.l = low
## If Economy..GDP.per.Capita. = (0.754,1.12]     then Happiness.Score.l = medium
## If Economy..GDP.per.Capita. = (1.12,1.83]      then Happiness.Score.l = high
## 
## Accuracy:
## 142 of 222 instances classified correctly (63.96%)
## 
## 
##     Attribute                     Accuracy
## 1 * Economy..GDP.per.Capita.      67.12%  
## 2   Health..Life.Expectancy.      65.77%  
## 3   Family                        61.71%  
## 4   Trust..Government.Corruption. 56.31%  
## 5   Dystopia.Residual             55.41%  
## 6   Freedom                       50.9%   
## 7   Generosity                    43.69%  
## ---
## Chosen attribute due to accuracy
## and ties method (if applicable): '*'
## 
## 
## Call:
## OneR(data = data, formula = Happiness.Score.l ~ ., verbose = TRUE)
## 
## Rules:
## If Economy..GDP.per.Capita. = (-0.00182,0.68] then Happiness.Score.l = low
## If Economy..GDP.per.Capita. = (0.68,1.24]     then Happiness.Score.l = medium
## If Economy..GDP.per.Capita. = (1.24,1.83]     then Happiness.Score.l = high
## 
## Accuracy:
## 149 of 222 instances classified correctly (67.12%)

Model evaluation

The function eval_model() prints confusion matrices for absolute and relative predictions, as well as accuracy, error and error rate reduction. For comparison with other models, it would have been convenient to be able to extract these performance metrics directly from the eval_model object, instead of only the confusion matrix and values of correct/all instances and having to re-calculate performance metrics again manually.

<span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">model</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"model_"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">))</span><span class="w">
  </span><span class="n">eval_model</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">model</span><span class="p">,</span><span class="w"> </span><span class="n">test_data</span><span class="p">),</span><span class="w"> </span><span class="n">test_data</span><span class="o">$</span><span class="n">Happiness.Score.l</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>
## 
## Confusion matrix (absolute):
##           Actual
## Prediction high low medium Sum
##     high     23   0     11  34
##     low       1  26     10  37
##     medium    7   5     10  22
##     Sum      31  31     31  93
## 
## Confusion matrix (relative):
##           Actual
## Prediction high  low medium  Sum
##     high   0.25 0.00   0.12 0.37
##     low    0.01 0.28   0.11 0.40
##     medium 0.08 0.05   0.11 0.24
##     Sum    0.33 0.33   0.33 1.00
## 
## Accuracy:
## 0.6344 (59/93)
## 
## Error rate:
## 0.3656 (34/93)
## 
## Error rate reduction (vs. base rate):
## 0.4516 (p-value = 2.855e-09)
## 
## 
## Confusion matrix (absolute):
##           Actual
## Prediction high low medium Sum
##     high     19   0      1  20
##     low       3  28     14  45
##     medium    9   3     16  28
##     Sum      31  31     31  93
## 
## Confusion matrix (relative):
##           Actual
## Prediction high  low medium  Sum
##     high   0.20 0.00   0.01 0.22
##     low    0.03 0.30   0.15 0.48
##     medium 0.10 0.03   0.17 0.30
##     Sum    0.33 0.33   0.33 1.00
## 
## Accuracy:
## 0.6774 (63/93)
## 
## Error rate:
## 0.3226 (30/93)
## 
## Error rate reduction (vs. base rate):
## 0.5161 (p-value = 1.303e-11)
## 
## 
## Confusion matrix (absolute):
##           Actual
## Prediction high low medium Sum
##     high     23   0     11  34
##     low       0  25      7  32
##     medium    8   6     13  27
##     Sum      31  31     31  93
## 
## Confusion matrix (relative):
##           Actual
## Prediction high  low medium  Sum
##     high   0.25 0.00   0.12 0.37
##     low    0.00 0.27   0.08 0.34
##     medium 0.09 0.06   0.14 0.29
##     Sum    0.33 0.33   0.33 1.00
## 
## Accuracy:
## 0.6559 (61/93)
## 
## Error rate:
## 0.3441 (32/93)
## 
## Error rate reduction (vs. base rate):
## 0.4839 (p-value = 2.116e-10)
## 
## 
## Confusion matrix (absolute):
##           Actual
## Prediction high low medium Sum
##     high     23   0     11  34
##     low       2  26     11  39
##     medium    6   5      9  20
##     Sum      31  31     31  93
## 
## Confusion matrix (relative):
##           Actual
## Prediction high  low medium  Sum
##     high   0.25 0.00   0.12 0.37
##     low    0.02 0.28   0.12 0.42
##     medium 0.06 0.05   0.10 0.22
##     Sum    0.33 0.33   0.33 1.00
## 
## Accuracy:
## 0.6237 (58/93)
## 
## Error rate:
## 0.3763 (35/93)
## 
## Error rate reduction (vs. base rate):
## 0.4355 (p-value = 9.799e-09)
## 
## 
## Confusion matrix (absolute):
##           Actual
## Prediction high low medium Sum
##     high     21   0      3  24
##     low       0  26      8  34
##     medium   10   5     20  35
##     Sum      31  31     31  93
## 
## Confusion matrix (relative):
##           Actual
## Prediction high  low medium  Sum
##     high   0.23 0.00   0.03 0.26
##     low    0.00 0.28   0.09 0.37
##     medium 0.11 0.05   0.22 0.38
##     Sum    0.33 0.33   0.33 1.00
## 
## Accuracy:
## 0.7204 (67/93)
## 
## Error rate:
## 0.2796 (26/93)
## 
## Error rate reduction (vs. base rate):
## 0.5806 (p-value = 2.761e-14)

Because I want to calculate performance measures for the different classes separately and like to have a more detailed look at the prediction probabilities I get from the models, I prefer to obtain predictions with type = "prob. While I am not looking at it here, this would also allow me to test different prediction thresholds.

<span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">model</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"model_"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">))</span><span class="w">
  </span><span class="n">pred</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="s2">"model_"</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">),</span><span class="w">
                     </span><span class="n">sample_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">test_data</span><span class="p">),</span><span class="w">
                     </span><span class="n">predict</span><span class="p">(</span><span class="n">model</span><span class="p">,</span><span class="w"> </span><span class="n">test_data</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"prob"</span><span class="p">),</span><span class="w">
                     </span><span class="n">actual</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">test_data</span><span class="o">$</span><span class="n">Happiness.Score.l</span><span class="p">)</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">prediction</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">pred</span><span class="p">)[</span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">][</span><span class="n">apply</span><span class="p">(</span><span class="n">pred</span><span class="p">[,</span><span class="w"> </span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">],</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">which.max</span><span class="p">)]</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">correct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">pred</span><span class="o">$</span><span class="n">actual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">pred</span><span class="o">$</span><span class="n">prediction</span><span class="p">,</span><span class="w"> </span><span class="s2">"correct"</span><span class="p">,</span><span class="w"> </span><span class="s2">"wrong"</span><span class="p">)</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">pred_prob</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NA</span><span class="w">
  
  </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">pred</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">pred</span><span class="p">[</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="s2">"pred_prob"</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">pred</span><span class="p">[</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">])</span><span class="w">
  </span><span class="p">}</span><span class="w">
  
  </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">pred_df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pred</span><span class="w">
  </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">pred_df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">pred_df</span><span class="p">,</span><span class="w"> </span><span class="n">pred</span><span class="p">)</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

Comparing other algorithms

Decision trees

First, I am building a decision tree with the rpart package and rpart() function. This, we can plot with rpart.plot().

Economy GDP per capita is the second highest node here, the best predictor here would be health and life expectancy.

<span class="n">library</span><span class="p">(</span><span class="n">rpart</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rpart.plot</span><span class="p">)</span><span class="w">

</span><span class="n">set.seed</span><span class="p">(</span><span class="m">42</span><span class="p">)</span><span class="w">
</span><span class="n">fit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rpart</span><span class="p">(</span><span class="n">Happiness.Score.l</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">.</span><span class="p">,</span><span class="w">
            </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">train_data</span><span class="p">,</span><span class="w">
            </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"class"</span><span class="p">,</span><span class="w">
            </span><span class="n">control</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rpart.control</span><span class="p">(</span><span class="n">xval</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">),</span><span class="w"> 
            </span><span class="n">parms</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">split</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"information"</span><span class="p">))</span><span class="w">

</span><span class="n">rpart.plot</span><span class="p">(</span><span class="n">fit</span><span class="p">,</span><span class="w"> </span><span class="n">extra</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w">
</span>

In order to compare the models, I am producing the same output table for predictions from this model and combine it with the table from the OneR models.

<span class="n">pred</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rpart"</span><span class="p">,</span><span class="w">
                   </span><span class="n">sample_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">test_data</span><span class="p">),</span><span class="w">
                   </span><span class="n">predict</span><span class="p">(</span><span class="n">fit</span><span class="p">,</span><span class="w"> </span><span class="n">test_data</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"prob"</span><span class="p">),</span><span class="w">
                   </span><span class="n">actual</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">test_data</span><span class="o">$</span><span class="n">Happiness.Score.l</span><span class="p">)</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">prediction</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">pred</span><span class="p">)[</span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">][</span><span class="n">apply</span><span class="p">(</span><span class="n">pred</span><span class="p">[,</span><span class="w"> </span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">],</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">which.max</span><span class="p">)]</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">correct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">pred</span><span class="o">$</span><span class="n">actual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">pred</span><span class="o">$</span><span class="n">prediction</span><span class="p">,</span><span class="w"> </span><span class="s2">"correct"</span><span class="p">,</span><span class="w"> </span><span class="s2">"wrong"</span><span class="p">)</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">pred_prob</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NA</span><span class="w">
  
  </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">pred</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">pred</span><span class="p">[</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="s2">"pred_prob"</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">pred</span><span class="p">[</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">])</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span>
<span class="n">pred_df_final</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">pred_df</span><span class="p">,</span><span class="w">
                       </span><span class="n">pred</span><span class="p">)</span><span class="w">
</span>

Random Forest

Next, I am training a Random Forest model. For more details on Random Forest, check out my post “Can we predict flu deaths with Machine Learning and R?”.

<span class="n">set.seed</span><span class="p">(</span><span class="m">42</span><span class="p">)</span><span class="w">
</span><span class="n">model_rf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">caret</span><span class="o">::</span><span class="n">train</span><span class="p">(</span><span class="n">Happiness.Score.l</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">.</span><span class="p">,</span><span class="w">
                         </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">train_data</span><span class="p">,</span><span class="w">
                         </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rf"</span><span class="p">,</span><span class="w">
                         </span><span class="n">trControl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">trainControl</span><span class="p">(</span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"repeatedcv"</span><span class="p">,</span><span class="w"> 
                                                  </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> 
                                                  </span><span class="n">repeats</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> 
                                                  </span><span class="n">verboseIter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">))</span><span class="w">
</span>

The varImp() function from caret shows us which feature was of highest importance for the model and its predictions.

Here, we again find Economy GDP per captia on top.

<span class="n">varImp</span><span class="p">(</span><span class="n">model_rf</span><span class="p">)</span><span class="w">
</span>
## rf variable importance
## 
##                               Overall
## Economy..GDP.per.Capita.       100.00
## Dystopia.Residual               97.89
## Health..Life.Expectancy.        77.10
## Family                          47.17
## Trust..Government.Corruption.   29.89
## Freedom                         19.29
## Generosity                       0.00
<span class="n">pred</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rf"</span><span class="p">,</span><span class="w">
                   </span><span class="n">sample_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">test_data</span><span class="p">),</span><span class="w">
                   </span><span class="n">predict</span><span class="p">(</span><span class="n">model_rf</span><span class="p">,</span><span class="w"> </span><span class="n">test_data</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"prob"</span><span class="p">),</span><span class="w">
                   </span><span class="n">actual</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">test_data</span><span class="o">$</span><span class="n">Happiness.Score.l</span><span class="p">)</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">prediction</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">pred</span><span class="p">)[</span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">][</span><span class="n">apply</span><span class="p">(</span><span class="n">pred</span><span class="p">[,</span><span class="w"> </span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">],</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">which.max</span><span class="p">)]</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">correct</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">pred</span><span class="o">$</span><span class="n">actual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">pred</span><span class="o">$</span><span class="n">prediction</span><span class="p">,</span><span class="w"> </span><span class="s2">"correct"</span><span class="p">,</span><span class="w"> </span><span class="s2">"wrong"</span><span class="p">)</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">pred_prob</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NA</span><span class="w">
  
  </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">pred</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">pred</span><span class="p">[</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="s2">"pred_prob"</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">pred</span><span class="p">[</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="o">:</span><span class="m">5</span><span class="p">])</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span>
<span class="n">pred_df_final</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">pred_df_final</span><span class="p">,</span><span class="w">
                       </span><span class="n">pred</span><span class="p">)</span><span class="w">
</span>

Extreme gradient boosting trees

Gradient boosting is another decision tree-based algorithm, explained in more detail in my post “Extreme Gradient Boosting and Preprocessing in Machine Learning”.

<span class="n">set.seed</span><span class="p">(</span><span class="m">42</span><span class="p">)</span><span class="w">
</span><span class="n">model_xgb</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">caret</span><span class="o">::</span><span class="n">train</span><span class="p">(</span><span class="n">Happiness.Score.l</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">.</span><span class="p">,</span><span class="w">
                         </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">train_data</span><span class="p">,</span><span class="w">
                         </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"xgbTree"</span><span class="p">,</span><span class="w">
                         </span><span class="n">trControl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">trainControl</span><span class="p">(</span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"repeatedcv"</span><span class="p">,</span><span class="w"> 
                                                  </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> 
                                                  </span><span class="n">repeats</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> 
                                                  </span><span class="n">verboseIter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">))</span><span class="w">
</span>

As before, we again find Economy GDP per capita as most important feature.

<span class="n">varImp</span><span class="p">(</span><span class="n">model_xgb</span><span class="p">)</span><span class="w">
</span>
## xgbTree variable importance
## 
##                          Overall
## Economy..GDP.per.Capita.  100.00
## Health..Life.Expectancy.   67.43
## Family                     46.59
## Freedom                     0.00
<span class="n">pred</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"xgb"</span><span class="p">,</span><span class="w">
                   </span><span class="n">sample_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">test_data</span><span class="p">),</span><span class="w">
                   </span><span class="n">predict</span><span class="p">(</span><span class="n">model_xgb</span><span class="p">,</span><span class="w"> </span><span class="n">test_data</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"prob"</span><span class="p">),</span><span class="w">
                   </span><span class="n">actual</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">test_data</span><span class="o">$</span><span class="n">Happiness.Score.l</span><span class="p">)</span><span class="w">
  </span><span class="n">pred</span><span class="o">$</span><span class="n">prediction</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">colnames</span><span class=&qu...

To leave a comment for the author, please follow the link and comment on their blog: Shirin's playgRound.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)