Military Expenditure Estimation with RJAGS Simulation

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The side effect of the ongoing Russian occupation was that it stimulated countries’ intentions to increase their military budgets. Last month following the invasion, Germany announced a budget of 100 billion € for the restructuring of the army and announced that it would spend %2 of its GDP every year.

Of course, it is not a good development for the world. For this reason, I’ve just wondered which motives affect the military spending of the countries. I will take G20 countries’ annual military expenditure as a target variable. And as the predictors:

All the data we will use is from Our World in Data. We are going to use a linear model with simulations in which the Markov chain Monte Carlo(MCMC) method generates samples from Bayesian posterior distributions. This process is called JAGS and in order to use that, we will use rjags package. But we have to install JAGS library separately.

First, we create a data frame that we are going to use in analysis and modeling.

library(tidyverse)
library(readxl)
library(sysfonts)
library(showtext)
library(rjags)
library(coda)

df <- read_excel("military_expenditure.xlsx")

#Creating region variable for comparing analysis and 
#region_int variable for jags modeling
df <- 
  df %>% 
  mutate(region=if_else(country %in% c("Australia","China","India","Indonesia",
                                       "Japan","Russia","Saudi Arabia",
                                       "South Africa","South Korea","Turkey"),
                        "East","West") %>% as_factor(),
         region_int = region %>% as.integer()) %>% 
  na.omit() %>% 
  as.data.frame()

We will examine the expenditure of the G20 countries from the cold war ends until the recent, for comparison. We colorize each region separately.

#load fonts(google)
font_add_google("Roboto Slab")
showtext_auto()

#Comparison Plot
df %>%
  filter(year > 1991) %>% 
  mutate(country = fct_reorder(country, expenditure)) %>%
  ggplot(aes(expenditure, country, fill = region, color = region)) +
  geom_boxplot(position = position_dodge(), alpha = 0.5) +
  scale_x_log10(labels = scales::label_number_si()) +
  scale_fill_brewer(palette = "Pastel1") +
  scale_color_brewer(palette = "Pastel1") +
  labs(y = NULL, color = NULL, fill = NULL, x = "Military Expenditures per Country($)")+
  theme_minimal()+
  theme(legend.position = "right",
        panel.background = element_rect(fill = "#e5f5f9",color = NA),
        axis.text = element_text(family = "Roboto Slab", size = 10),
        axis.title = element_text(family = "Roboto Slab")) 
  

The interesting thing is Germany doesn’t appear so much different from Russia.

First of all, we will build the model with the region variable we created before, so we will see its effect on military expenditures change. Because of the jags.model function can not employ non-numeric data as predictors we have to use the integer type of region variable (region_int) we created before.

#modeling with region
set.seed(123)
 
model_with <- 
  lm(expenditure ~ civlib + gdp + region, data = df)
 
 
> model_with
 
#Call:
#lm(formula = expenditure ~ civlib + gdp + region, data = df)
 
#Coefficients:
#(Intercept)       civlib          gdp   regionEast  
# -3.052e+10    7.380e+10    5.367e-02   -4.726e+10
 

Because of the linear model above, we take the region variable as regionEast from now.

#the dataset for jags model
lt_with <- list(x = df[,c(3, 4, 7)], y = df[,5],
                n = nrow(df))

#estimation coefficients
lt_with$b_guess <- model_with$coefficients

#Model string
jags.script_with <- 
  "#--------------------------------------------------
model{
    # likelihood
    for( i in 1:n) {
        y[i] ~ dnorm(mu[i], tau)
        mu[i] <- intercept + civlib*x[i,1] + gdp*x[i,2]+ regionEast*x[i,3]
    }
    # priors
    intercept ~ dnorm(b_guess[1], 0.1)
    civlib ~ dnorm(b_guess[2], 0.1)
    gdp ~ dnorm(b_guess[3], 0.1)
    regionEast ~ dnorm(b_guess[4], 0.1)
    tau   ~ dgamma(.01,.01)
    # transform
    sigma <- 1 / sqrt(tau)
}#--------------------------------------------------
" 
#compiling
mod_with <- jags.model(textConnection(jags.script_with), 
                   data = lt_with, 
                   n.chains = 4, n.adapt = 2000)

#burn-in process
update(mod_with, 1000)

#posterior sampling
mod.samples_with <- coda.samples(
  model = mod_with, 
  variable.names = c("intercept", "civlib", "gdp", "regionEast", "sigma"), 
  n.iter = 1000)
 
# descriptive statistics of posteriors
sm_with <- summary(mod.samples_with)


sm_with[1]
#$statistics
#                    Mean           SD     Naive SE Time-series SE
#civlib      7.379818e+10 3.127715e+00 4.945351e-02   4.885913e-02
#gdp         7.665518e-02 2.448805e-03 3.871901e-05   3.598913e-05
#intercept  -3.051892e+10 3.197261e+00 5.055314e-02   5.056658e-02
#regionEast -4.726244e+10 3.099717e+00 4.901083e-02   4.900839e-02
#sigma       1.182377e+11 2.493767e+09 3.942992e+07   3.798331e+07


As you can see above, the GDP variable has no effect, while the eastern region(regionEast) is a powerful predictor. This time we will exclude the region variable and compare the variables’ effect for each state on a plot. To fully understand the process and parameters, I’d recommend reading the JAGS User Manual Chapter 3. We will prefer time series SE as the standard error for calculating the prediction intervals because it takes into account auto-correlation.

#without region
model_without <- 
  lm(expenditure ~ civlib + gdp, data = df)


#the dataset for jags model
lt_without <- list(x = df[,c(3, 4)], y = df[,5],
                   n = nrow(df))


#estimation coefficients
lt_without$b_guess <- model_without$coefficients

# Model string
jags.script_without <- 
  "#--------------------------------------------------
model{
    # likelihood
    for( i in 1:n) {
        y[i] ~ dnorm(mu[i], tau)
        mu[i] <- intercept + civlib*x[i,1] + gdp*x[i,2]
    }
    # priors
    intercept ~ dnorm(b_guess[1], 0.1)
    civlib ~ dnorm(b_guess[2], 0.1)
    gdp ~ dnorm(b_guess[3], 0.1)
    tau   ~ dgamma(.01,.01)
    # transform
    sigma <- 1 / sqrt(tau)
}#--------------------------------------------------
"

#compiling
mod_without <- jags.model(textConnection(jags.script_without), 
                   data = lt_without,
                   n.chains = 4, n.adapt = 2000)



#burn-in process
update(mod_with, 1000)

#posterior sampling
mod.samples_without <- coda.samples(
  model = mod_without, 
  variable.names = c("intercept", "civlib", "gdp", "sigma"), 
  n.iter = 1000)

# descriptive statistics of posteriors
sm_without <- summary(mod.samples_without)

#The plot of the effects of the terms on the change of expenditure
bind_rows(
  sm_with$statistics %>% 
    as_tibble(rownames = "terms") %>% 
    rename("mean"= Mean,
           "std.error" = `Time-series SE`) %>% 
    select(c(1,2,5)) %>% 
    mutate(regions = "include"),
  sm_without$statistics %>% 
    as_tibble(rownames = "terms") %>% 
    rename("mean"= Mean,
           "std.error" = `Time-series SE`) %>% 
    select(c(1,2,5)) %>% 
    mutate(regions = "not include")
) %>%
  filter(terms != "intercept" , terms != "sigma") %>%
  ggplot(aes(mean, terms, color = regions)) +
  geom_vline(xintercept = 0, size = 1.5, lty = 2, color = "gray50") +
  geom_errorbar(size = 2, alpha = 0.7,
                aes(xmin = mean - 1.96 * std.error, 
                    xmax = mean + 1.96 * std.error)) +
  geom_point(size = 3) +
  scale_x_continuous(labels = scales::dollar,
                     expand = expansion(mult = c(.1, .1))) +
  scale_color_brewer(palette = "Dark2") +
  labs(x="", y = "", color = "Region in Model?")+
  theme_minimal()+
  theme(legend.position="right",
        panel.background = element_rect(fill = "#e5f5f9", color = NA))


According to the plot above, while being in the east decrease the military expenditures, the effect of civil liberties increases the military budget, which is unexpected for me. This anomaly can be explained by the United States’ extreme values, which we have seen in the first plot we created before. Also, the prediction intervals seem to be near zero because of the MCMC simulation we did with the high iteration number.

To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)