Site icon R-bloggers

A Structural Model of the World Happiness Report

[This article was first published on Blogs on Adejumo R.S, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
“If you can’t explain it simply, you don’t understand it well enough.”
— Albert Einstein

Introduction

Structural equation modelling is a statistical technique used mostly in behavioral and cognitive sciences to see how some selected factors affect a latent variable. A latent variable is a variable that can’t be observed but can be inferred from other variables. This post is going to use confirmatory factor analysis (CFA); a sub term in SEM to see if happiness can be predicted from the world happiness index variables.

The World Happiness Report data

The data for the world happiness report was sourced from kaggle and can be downloaded here. The data consists of 150 countries and variables: happiness score, GDP, social support, life expectancy, freedom, generosity and corruption. All variables are continuous variables and there is absence of missing values in the dataset.

Load the dataset

#load the libraries 
library(tidyverse)
whi_data <- readr::read_csv("C:/Users/Adejumo/Downloads/2021.csv") %>% 
  #select required columns
  select(c(1,3,7:12))

#change variable names
whi_data <- whi_data %>% 
  mutate(score = `Ladder score`,
         GDP = `Logged GDP per capita`,
         support = `Social support`,
         Life_exp = `Healthy life expectancy`,
         freedom = `Freedom to make life choices`,
         corruption = `Perceptions of corruption`,
         .keep = "unused")

#top 6 rows
head(whi_data)
## # A tibble: 6 x 8
##   `Country name` Generosity score   GDP support Life_exp freedom corruption
##   <chr>               <dbl> <dbl> <dbl>   <dbl>    <dbl>   <dbl>      <dbl>
## 1 Finland            -0.098  7.84  10.8   0.954     72     0.949      0.186
## 2 Denmark             0.03   7.62  10.9   0.954     72.7   0.946      0.179
## 3 Switzerland         0.025  7.57  11.1   0.942     74.4   0.919      0.292
## 4 Iceland             0.16   7.55  10.9   0.983     73     0.955      0.673
## 5 Netherlands         0.175  7.46  10.9   0.942     72.4   0.913      0.338
## 6 Norway              0.093  7.39  11.1   0.954     73.3   0.96       0.27

Measurement Model

The measurement model enables us to know factors that will give us a good fit. This allows us to know the factors that have a large amount of effect on happiness before fitting the SEM. In the measurement model, a rule of thumb is that factors with factor loading less than 0.5 have a weak effect on the variable and hence they should be dropped. To perform a structural equation modelling in R we need the “lavaan” package and the “semptools” package for plotting the Directed Acyclic Graph (DAG) showing the relationship paths.

#load libraries
library(lavaan)
library(semPlot)

#constructing the measurement model
mea_model <- "Happiness = ~score + Generosity + GDP + support + Life_exp + freedom + corruption"

#fitting the measurement model
mea_fit <- sem(mea_model,
               data = whi_data)

#summary statistics of the measurement model
summary(mea_fit,
        fit.measures = TRUE)
## lavaan 0.6-9 ended normally after 59 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        14
##                                                       
##   Number of observations                           149
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                80.437
##   Degrees of freedom                                14
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               667.346
##   Degrees of freedom                                21
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.897
##   Tucker-Lewis Index (TLI)                       0.846
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -315.326
##   Loglikelihood unrestricted model (H1)       -275.107
##                                                       
##   Akaike (AIC)                                 658.652
##   Bayesian (BIC)                               700.707
##   Sample-size adjusted Bayesian (BIC)          656.401
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.178
##   90 Percent confidence interval - lower         0.142
##   90 Percent confidence interval - upper         0.217
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.087
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Happiness =~                                        
##     score             1.000                           
##     Generosity       -0.022    0.014   -1.659    0.097
##     GDP               1.154    0.069   16.618    0.000
##     support           0.103    0.008   13.455    0.000
##     Life_exp          6.522    0.418   15.613    0.000
##     freedom           0.066    0.009    7.157    0.000
##     corruption       -0.075    0.015   -4.869    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .score             0.279    0.040    6.950    0.000
##    .Generosity        0.022    0.003    8.621    0.000
##    .GDP               0.179    0.035    5.130    0.000
##    .support           0.004    0.001    7.409    0.000
##    .Life_exp          8.534    1.366    6.247    0.000
##    .freedom           0.009    0.001    8.417    0.000
##    .corruption        0.027    0.003    8.539    0.000
##     Happiness         0.867    0.131    6.623    0.000

From the summary statistics above, we are able to see that the model is close almost a good fit from the model fit indices: CFI = 0.897, TLI = 0.846, RMSEA = 0.178 and SRMEA = 0.087. A model is said to be a good fit if CFI and TLI are greater than 0.95 and SRMR and RMSEA are in the range 0.05-1.00. The DAG diagram below will let us know which factors to remove to ensure a good model fit.

semPaths(mea_fit, "par", 
         edge.label.cex = 1.2, 
         fade = FALSE,
         edge.label.postition = 1,
         edge.label. = 70,
         edge.label.by = TRUE, 
         layout = "tree",
         sizeMan = 10)

From the diagram above, support and freedom have a factor loading less than 0.5 signifying that they have less impact on happiness. To fit the structural equation model, the variables support and freedom will have to be removed. We leave the variables corruption and generosity due to their negative relationship.

The Structural Equation Model

Now that we know the variables that will contribute more effect to our model, we can now fit the SEM model.

#constructing the measurement model
sem_model <- "Happiness = ~score + Generosity + GDP + Life_exp + corruption"

#fitting the measurement model
sem_fit <- sem(sem_model,
               data = whi_data)

#summary statistics of the measurement model
summary(sem_fit,
        fit.measures = TRUE)
## lavaan 0.6-9 ended normally after 30 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        10
##                                                       
##   Number of observations                           149
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                23.651
##   Degrees of freedom                                 5
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               409.546
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.953
##   Tucker-Lewis Index (TLI)                       0.907
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -640.835
##   Loglikelihood unrestricted model (H1)       -629.010
##                                                       
##   Akaike (AIC)                                1301.671
##   Bayesian (BIC)                              1331.710
##   Sample-size adjusted Bayesian (BIC)         1300.063
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.158
##   90 Percent confidence interval - lower         0.098
##   90 Percent confidence interval - upper         0.225
##   P-value RMSEA <= 0.05                          0.003
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.072
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Happiness =~                                        
##     score             1.000                           
##     Generosity       -0.027    0.014   -1.921    0.055
##     GDP               1.200    0.079   15.189    0.000
##     Life_exp          6.844    0.462   14.799    0.000
##     corruption       -0.078    0.016   -4.910    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .score             0.332    0.047    7.114    0.000
##    .Generosity        0.022    0.003    8.616    0.000
##    .GDP               0.162    0.042    3.896    0.000
##    .Life_exp          7.316    1.479    4.948    0.000
##    .corruption        0.027    0.003    8.524    0.000
##     Happiness         0.813    0.130    6.270    0.000

Though not all model fit parameters were met, but we can not ignore the fact that the CFI = 0.953 and the SRMR = 0.072. The final structural equation DAG diagram is given below:

semPaths(sem_fit, "par", 
         edge.label.cex = 1.2, 
         fade = FALSE,
         edge.label.postition = 1,
         edge.label. = 70,
         edge.label.by = TRUE, 
         layout = "tree",
         sizeMan = 10)

Conclusion

From the analysis above, we can now see that countries with high life expectancy, GDP and happiness score are more likely to be happy. Although I did not expect generosity to be negatively related with happiness but we are in a era where people take advantage of generous people and trust are broken, this might be a possible explanation for such relationship between generosity and happiness. Also as expected a country that is corrupt will not have happy individuals.

Thanks for your time as you read this analysis, though there is more to SEM than just this few paragraphs I have written in this post.

To leave a comment for the author, please follow the link and comment on their blog: Blogs on Adejumo R.S.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.