Site icon R-bloggers

Importance of exports and economic growth, cross-country time series

[This article was first published on Peter's stats stuff - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Exports and economic growth

I was looking to show a more substantive piece of analysis using the World Development Indicators data, and at the same time show how to get started on fitting a mixed effects model with grouped time series data. The relationship between exports’ importance in an economy and economic growth forms a good start as it is of considerable theoretical and practical policy interest and has fairly reliable time series data for many countries. At the most basic level, there is a well known positive relationship between these two variables:

The relationship is made particularly strong by the small, rich countries in the top right corner. Larger countries, with their own large domestic markets, can flourish economically while being less exports-dependent than smaller countries – the USA being the example par excellence. If the regression is weighted by population, the relationship is much weaker than shown in the above diagram.

However, today, I’m looking at a different aspect of the relationship – changes over time. Partly this is because I’m genuinely interested, but mostly because I needed an example demonstrating fitting a mixed effects model to longitudinal data with a time series component.

The data come from the World Bank’s World Development Indicators (WDI), which I explored recently in my last post. I’m comparing “Exports of goods and services (% of GDP)” with “GDP per capita (constant 2000 US$)”. The WDI have at least some data on these variables for 186 countries, but different starting years for each (earliest being 1962). The data look like this, in a connected scatterplot showing the relationship between the two variables for 12 randomly chosen countries:

… and this, in a more straightforward time series line plot:

Close watchers will see that there are some country groupings in the dataset (eg “Pacific island small states”) that we don’t want, in addition to the individual countries we do. So our first job is to get rid of these. Here’s the R code that pulls in the data, draws the plots so far, and removes those country groupings.

library(WDI)
library(ggplot2)
library(scales)
library(dplyr)
library(tidyr)
library(showtext) # for s
library(nlme)

# import s
.add.google("Poppins", "my")
showtext.auto()
theme_set(theme_light(10, base_family = "my"))


#-------------------data imports, first explore, and clean-----------------
if(!exists("exports")){
   # Exports of goods and services (% of GDP)
   exports <- WDI(indicator = "NE.EXP.GNFS.ZS", end = 2015, start = 1950)
   # GDP per capita (constant 2000 US$)
   gdp <- WDI(indicator = "NY.GDP.PCAP.KD", end = 2015, start = 1950)
}

both <- merge(exports, gdp) %>%
   rename(exports = NE.EXP.GNFS.ZS,
          gdp = NY.GDP.PCAP.KD) %>%
   # removing any year-country combos missing eith gdp or exports:
   filter(!(is.na(exports) | is.na(gdp))) %>%
   arrange(country, year)

# let's look at 12 countries at a time
all_countries <- unique(both$country)
sampled <- both    %>%
   filter(country %in% sample(all_countries, 12))
   
# connected scatter plot:
p1 <- sampled %>%
   ggplot(aes(y = gdp, x = exports, colour = year)) +
   facet_wrap(~country, scales = "free") +
   geom_path() +
   labs(x = "Exports as a percentage of GDP") +
   scale_y_continuous("GDP per capita, constant 2000 US dollars", label = dollar) +
   ggtitle("Exports and GDP over time, selected countries")


# univariate time series plots
p2 <- sampled %>%
   gather(variable, value, -(iso2c:year)) %>%
   ggplot(aes(x = year, y = value)) +
   geom_line() +
   facet_wrap(~ country + variable, scales = "free_y")


# how to get rid of the country groups?
unique(both[ , c("iso2c", "country")]) %>% arrange(iso2c)
# they have a number as first or second digit, or X or Z as first digit (but ZW ZA ZM legit)

both2 <- both %>%
   filter(!grepl("X.", iso2c) & !grepl("[0-9]", iso2c)) %>%
   filter(!grepl("Z.", iso2c) | iso2c %in% c("ZW", "ZA", "ZM")) %>%
   filter(!iso2c %in% c("OE", "EU"))

#------------------------simple cross section used for first plot-------------------
country_sum <- both2 %>%
   group_by(country) %>%
   filter(year == max(year))

p3 <- country_sum %>%
   ggplot(aes(x = exports / 100, y = gdp, label = country)) +
   geom_text(size = 3) +
   geom_smooth(method = "lm") +
   scale_y_log10(label = dollar) +
   scale_x_log10(label = percent) +
   labs(x = "Exports as a percentage of GDP",
        y = "GDP per capita, constant 2000 US dollars",
        title = "Cross section of exports and GDP, latest years when data are present")

The plan is to fit an appropriate time series model with GDP as a response variable and exports as a percentage of GDP as an explanatory variable, allowing country to be a group random effect, and the country-year individual randomness to be related year to year as a time series. This means a mixed effects model. There’s a bit of transforming I need to do:

Here’s the distribution of the first recorded value of the logarithm of exports as a percentage of gdp:

Here’s the connected scatter plot of the differenced logarithms (effectively, growth rates):

Here’s the traditional time series plots:

And here’s the code that does the transformations and plots:

# first we want to know the value of exports at the beginning of each series
first_values <- both2 %>%
   group_by(country) %>%
   filter(year == min(year)) %>%
   ungroup() %>%
   mutate(exports_starter = log(exports),
          first_year = year) %>%
   select(iso2c, exports_starter, first_year)

p4 <- ggplot(both3, aes(x = exports_starter)) + geom_density()

# all those individual time series have problem of non-stationarity and of course autocorrelation
# so when we merge with the starting values of exports we also calculate first
# differences of logarithms
both3 <- both2 %>%
   left_join(first_values, by = "iso2c") %>%
   arrange(country, year) %>%
   group_by(country) %>%
   mutate(exports_g = c(NA, diff(log(exports))),
          gdp_g = c(NA, diff(log(gdp))),
          exports_lag = lag(exports_g)) %>%
   ungroup() %>%
   filter(!is.na(gdp_g)) %>%
   filter(!is.na(exports_lag))

all_countries2 <- unique(both3$country)

set.seed(123)
sampled <- both3 %>%
   filter(country %in% sample(all_countries2, 12)) 

p5 <- sampled %>%
   select(country, year, exports_g, gdp_g) %>%
   gather(variable, value, -(country:year)) %>%
   ggplot(aes(x = year, y = value)) +
   geom_line() +
   facet_wrap(~ country + variable, scales = "free_y")

p6 <- sampled %>%
   select(country, year, exports_g, gdp_g) %>%
   ggplot(aes(x = exports_g, y = gdp_g, colour = year)) +
   geom_smooth(colour = "red", method = "lm") +
   geom_path() +
   geom_point() +
   facet_wrap(~ country, scales = "free") +
   labs(x = "change in logarithm of exports share of GDP",
        y = "change in logarithm of GDP per capita, constant 2000 USD",
        title = "Changing importance of exports and GDP growth, selected countries")

So we’re ready for some modelling. After all the data transformations, this part is relatively easy. I use a mixed effects model that lets the key effects vary in each country, around an average level of the effect.

I tried a few simpler models than below and checked the auto-correlation functions to be sure the time series part was needed (it is; ACF() returns an autocorrelation at lag 1 of around 0.23, and the decay of autocorrelation in subsequent lags characteristic of a fairly simple AR process). So here we go:

model4 <- lme(fixed = gdp_g ~ ordered(first_year) + exports_g + exports_lag + exports_starter + year,
              random = ~ exports_g + exports_lag + year | country, 
              correlation = corAR1(form = ~ year | country),
              data = both3)

Just to visualise what this model is doing, imagine the connected scatter plot of the differenced logarithms of each variable, as shown earlier. We’re drawing a diagonal line of best fit on each facet, with the slope and intercept of each line allowed to different for each country. The average slope is the average impact of changes in exports as a percent of GDP on GDP growth. Now add to this the complication of the other variables – the first year data starts (basically just a nuisance variable we want to control for); each countrys’ absolute level of exports as a percentage of GDP when its data started; the lagged value of exports (which is actually the number of most interest); and year, for which we’re trying to control for any linear trend.

The “correlation = corAR1(…)” part is important, as it means that when we come to conduct inference (t statistics, p values and confidence intervals) the model takes into account that each observation in a particular country is related to the previous year’s – they aren’t as valuable as independent and identically distributed data from a simple random sample. Failing to include this factor would mean that inference was biased in the direction of concluding results are significant that are in fact due to chance.

And here are the results. Bear in mind that:

> round(tail(summary(model4)$tTable, 4), 3)
                 Value Std.Error   DF t-value p-value
exports_g        0.031     0.010 7006   3.191   0.001
exports_lag      0.021     0.006 7006   3.579   0.000
exports_starter -0.002     0.002  150  -1.233   0.220
year             0.000     0.000 7006  -1.328   0.184

Not wanting to give too precise an interpretation of this without a bit more theory and thinking, it suggests for countries on average:

As exports_g, exports_lag and year were all allowed to be random effects ie take different values by country, their differing values for each country are of interest. The figures above reflect an overall effect of these variables; the actual value in any country is a (approximately) normally distributed random variable. Here’s how their’ densities look:

Each observation (the little rug marks on the horizontal axis) is the size of the effect for a particular country. So we can see that while overall value of exports_lag – the persistent impact of a change in exports as a percentage of GDP on GDP growth – was 0.021 as per the table above, any particular country has a value that is lower or hight than that, with a reasonable number of countries seeing a negative impact.

The value for the exports_g parameter plus the exports_lag parameter gives a crude sense of the overall impact, for a particular country, of changing importance of exports on GDP growth. As can be seen from the top right panel of the above set of plots, this combined value is generally but not always positive.

All up, while we might conclude that for countries “on average” there is a positive relationship between growth in the importance of exports and GDP growth, for any particular country the relationship will vary and may in fact be negative. Further investigation would need to place this in a more theoretical context, and control for other variables that might be confounding the results; but the above is probably enough for an exploratory blog post.

Here’s how to extract those country-level effects from our mixed effects model object:

cf <- coef(model4)

# let's look at just the last four coefficients ie exclude the nuisance of first_year
cf_df <- cf[, (ncol(cf) - 3):ncol(cf)] %>%
   as.data.frame() %>%
   mutate(country = rownames(cf),
          combined = exports_lag + exports_g) %>%
   arrange(combined)


p7 <- cf_df %>%
   gather(variable, value, -country, -exports_starter) %>%
   mutate(variable = gsub("combined", "exports_g + exports_lag", variable, fixed = TRUE)) %>%
   ggplot(aes(x = value)) +
   facet_wrap(~variable, scales = "free") +
   geom_density() +
   geom_rug()

To leave a comment for the author, please follow the link and comment on their blog: Peter's stats stuff - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.