Automating update of a fiscal database for the Euro Area

[This article was first published on Macroeconomic Observatory - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Our purpose is to write a program to automatically update a quarterly fiscal database for the Euro Area. The main difficulty of this exercise is to build long series that go as far as the 1980’s.

We use two sources to build the database: the historical database developed in Paredes et al. (2014), which stops in 2013, and the latest Eurostat data. Throughout this article, we explain how we chained each series of PPP with the Eurostat data.

Both databases are taken without any seasonal adjustment. At the end of the post, chained data series are seasonally adjusted using the seasonal package developed by Sax (2016) using the X13 methodology.

To be automated, the recent points of the database are taken from DBnomics using the rdbnomics package. All the code is written in R, thanks to the RCoreTeam (2016) and RStudioTeam (2016).

The database will contain the following series:

  • Direct taxes
  • Indirect taxes
  • Social security contributions by employees
  • Social security contributions by employers
  • Government consumption
  • Government investment
  • Government transfers
  • Government subsidies
  • Government compensation of employees
  • Unemployment benefits
  • Government debt
  • Interest payments
  • Total revenues
  • Total expenditures

Historical data

First we get the historical series built by Paredes et al. (2014). Problem is, the series are not all directly usable: the series of social contribution by contributors do not exist before 1991.

url <- "PPP_raw.xls"

ppp <- read_excel(url, skip = 1)

ppp %<>% 
  transmute(period      =   as.Date(as.yearqtr(`MILL. EURO, RAW DATA, NON-SEAS. ADJUSTED, SMOOTHED ESTIMATES`, format="%YQ%q")),
            totexp      =   TOE,               # Total expenditures
            pubcons     =   GCN,               # General government consumption expenditures
            pubinves    =   GIN,               # General government investment
            tfs         =   THN,               # Social payments
            unemp       =   `of which UNB`,    # Unemployment benefits (among social payments)
            salaries    =   COE,               # Compensation of employees
            subs        =   SIN,               # Subsidies
            intpay      =   INP,               # General government interest payments
            totrev      =   TOR,               # Total revenue
            indirtax    =   TIN,               # Total indirect taxes
            dirtax      =   DTX,               # Total direct taxes
            scr         =   as.numeric(SCR),   # Social contribution by employers
            sce         =   as.numeric(SCE),   # Social contribution by employees and self-employed
            sct         =   SCT,               # Total social security contributions
            debt        =   MAL) %>%           # Euro area general government debt
  filter(!is.na(period))

Assuming that the ratio of social contributions remains stable between employers and households before 1991, we can reconstruct the contribution of employees and employers using the series of total contribution. Using this technique we infer the series of social contribution by contributors before 1991.

# We calculate the ratio of social contribution by employers for the first point in our series
prcent <- 
  transmute(ppp, scr_sct=scr/sct) %>%
  na.omit() %>% first() %>% as.numeric()

# Using the ratio, we reconstruct earlier social contribution by contributor
scr_sce_before91 <- 
  filter(ppp, is.na(scr)) %>%
  select(period, sct, scr, sce) %>%
  transmute(period,
            scr=prcent*sct,
            sce=sct-scr) %>%
  gather(var, value, -period)

# We reinject the constructed series in the ppp database
ppp %<>%
  select(-sct) %>%
  gather(var, value, -period, na.rm = TRUE) %>%
  bind_rows(scr_sce_before91) %>%
  arrange(var, period)

maxDate <- 
  ppp %>%
  group_by(var) %>%
  summarize(maxdate=max(period)) %>%
  arrange(maxdate)
kable(maxDate)
var maxdate
debt 2013-10-01
dirtax 2013-10-01
indirtax 2013-10-01
intpay 2013-10-01
pubcons 2013-10-01
pubinves 2013-10-01
salaries 2013-10-01
sce 2013-10-01
scr 2013-10-01
subs 2013-10-01
tfs 2013-10-01
totexp 2013-10-01
totrev 2013-10-01
unemp 2013-10-01

Recent data

Historical data series stop in 2013. For latest points, we use DBnomics to get Eurostat’s data. Eurostat’s series on social contributions and on unemployment benefits present difficulties as well. We thus download the series from DBnomics in three steps:

  1. We take the series on social contributions and we treat them in order to build quarterly series by contributors.
  2. We take the series on unemployment benefits and we treat them in order to build quarterly series.
  3. We take the series that do not present any problems

Special case: social contributions

Download annual data

var_taken <- c('D613', # Annual Households' actual social contributions (D613) for general govt only (S13)
               'D612', # Annual Employers' imputed social contributions
               'D611') # Annual Employers' actual social contributions (D611) for general govt only (S13)
url_variables <- paste0(var_taken, collapse="+")
filter <- paste0('A.MIO_EUR.S13.',url_variables,'.EA19')

df <- rdb("Eurostat","gov_10a_taxag",mask = filter)

data_1 <- 
  df %>%
  select(period,var=na_item,value) %>%
  spread(var,value) %>% 
  mutate(sce=D613+D612,
         scr=D611) %>% 
  select(-D611,-D612,-D613) %>% 
  gather(var,value,-period) %>% 
  mutate(year=year(period))

The series of actual social contributions present 2 problems: (i) they are not quarterly; (ii) they are available only up to 2018. We fix this problem by using the two series of quarterly net total social contributions and quarterly employers contribution for the total economy.

From annual to quarterly data

# Quarterly Net social contributions, receivable (D61REC) for general govt only (S13)
df <- rdb("Eurostat","gov_10q_ggnfa",mask = "Q.MIO_EUR.NSA.S13.D61REC.EA19")
qsct <- 
  df %>%
  transmute(period, var = 'sct', value) %>%
  mutate(year=year(period))

maxtime <- summarize(qsct,maxdate=max(period))

To turn the two annual series of social contributions into quarterly series, we use the series of quarterly total net social contributions to calculate the share of each contributor for each year. Using this share and the quarterly value of the total net social contributions, we can deduce the quarterly value of the net social contributions of each contributor.

# Calculate total amount of sct by year
qsct_a <- 
  qsct %>%
  group_by(year) %>%
  summarise(value_a=sum(value))
qsct %<>% left_join(qsct_a, by="year")

# Convert data from annual to quarterly
qsce_uncomplete <- 
  filter(data_1, var=="sce")  %>%
  full_join(qsct, by="year") %>%
  transmute(period=period.y, 
            var=var.x,
            value=value.y*value.x/value_a) %>%
  filter(!is.na(value))

# Convert data from annual to quarterly
qscr_uncomplete <- 
  filter(data_1, var=="scr")  %>%
  full_join(qsct, by="year") %>%
  transmute(period=period.y, 
            var=var.x,
            value=value.y*value.x/value_a) %>%
  filter(!is.na(value))

We plot series to compare built quarterly series with annual series.

plot_treatment <- 
  bind_rows(qscr_uncomplete, qsce_uncomplete) %>%
  mutate(Origin="Deduced quarterly series",
         value=4*value) %>%  # We multiply by 4 because on the plot we compare quarterly level with annual levels
  bind_rows(mutate(data_1,Origin="Original annual series")) %>%
  mutate(var=ifelse(var=="sce","Social contribution of households","Social contribution of employers")) %>%
  select(-year)

ggplot(plot_treatment,aes(period,value,colour=Origin)) +
  geom_line() +
  facet_wrap(~var,ncol=2,scales = "free_y") +
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) +
  theme(strip.text=element_text(size=12),
        axis.text=element_text(size=8)) +
  theme(legend.title=element_blank())

plot of chunk unnamed-chunk-8

Most recent data

Now that we have the quarterly values, we use the series of total employers contribution for total economy along with the share of each contributors in total contributions to deduce latest points of contributions by households and employers.

# Quarterly Employers SSC for total economy
df <- rdb("Eurostat","namq_10_gdp",mask="Q.CP_MEUR.NSA.D12.EA19")
qscr_toteco <- 
  df %>%
  transmute(period,var = 'scr',value) %>%
  mutate(year=year(period))

# Using recent data on employers total contribution we chain forward the social contribution of employers
qscr <- 
  chain(to_rebase = qscr_toteco, 
        basis = qscr_uncomplete, 
        date_chain = max(qscr_uncomplete$period), 
        is_basis_the_recent_data=FALSE) %>%
  arrange(period)

# Assuming the ratio of social contribution by contributors remains constant over time, we deduce social contribution of households
qsce <- 
  bind_rows(qsce_uncomplete, 
            select(qsct, period, value, var), 
            qscr) %>%
  filter(period<=maxtime$maxdate) %>%
  spread(var,value, fill = 0) %>%
  transmute(period,
            sce=ifelse(period<=max(qsce_uncomplete$period),sce,sct-scr)) %>%
  gather(var, value, -period) %>%
  arrange(period)

Series of employers contribution are different in levels. Indeed, we are interested in social contributions of employers for general government only, and not for total economy. But the pattern of both series are very similar. So, by chaining them we take the variations from social contributions of employers for total economy and we apply them to the level of actual social contributions for general government only.

plot_treatment <- 
  bind_rows(qscr_uncomplete, 
            qsce_uncomplete) %>%
  mutate(Origin="Quarterly series") %>% 
  bind_rows(mutate(qscr_toteco ,Origin="Original quarterly series (for total economy)"),
            mutate(bind_rows(qsce,qscr), Origin="Chained series")) %>%
  mutate(var=ifelse(var=="sce","Contribution of households","Contribution of employers")) %>%
  select(-year)

ggplot(plot_treatment,aes(period,value,colour=Origin)) +
  geom_line() +
  facet_wrap(~var,ncol=2,scales = "free_y") +
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) +
  theme(strip.text=element_text(size=12),
        axis.text=element_text(size=8)) +
  theme(legend.title=element_blank()) +
  ggtitle("Social contribution forward chaining")

plot of chunk unnamed-chunk-11

Special case: unemployment benefits

We retrieve government social expenditures and compute their quaterly share for each year.

socialexp <- 
  rdb("Eurostat","gov_10q_ggnfa",mask = "Q.MIO_EUR.NSA.S13.D62PAY.EA19") %>% 
  mutate(year=year(period)) %>% 
  select(period,value,year) %>% 
  group_by(year) %>% 
  mutate(sum=sum(value),
         ratio=value/sum) %>% 
  ungroup() %>% 
  select(-value,-year,-sum)

Then we retrieve the latest annual data on unemployment benefits, put them in a quarterly table and use the previous ratio of quarterly social expenditures to compute quarterly unemployment benefits.

df <- rdb("Eurostat","gov_10a_exp",mask = "A.MIO_EUR.S13.GF1005.TE.EA19")
recent_unemp <- df %>%
  mutate(year=year(period)) %>%
  select(period,value,year)

recent_unemp_q <-
  tibble(period=seq(min(recent_unemp$period),
                    length.out=nrow(recent_unemp)*4,
                    by = "quarter"),
         year=year(period)) %>% 
  left_join(recent_unemp,by="year") %>% 
  select(-period.y,-year) %>% 
  rename(period=period.x)

unemp_q <-
  recent_unemp_q %>% 
  inner_join(socialexp,by="period") %>% 
  mutate(value=value*ratio,
         var="unemp") %>% 
  select(-ratio)

We compare historical data with new quarterly series.

data_plot <-
  ppp %>% 
  filter(var=="unemp") %>% 
  mutate(var="From PPP") %>% 
  bind_rows(mutate(unemp_q,var="New measure")) %>% 
  filter(year(period)>=2007)

ggplot(data_plot,aes(period,value,colour=var))+
  geom_line()+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) +
  theme(strip.text=element_text(size=12),
        axis.text=element_text(size=8)) +
  theme(legend.title=element_blank()) +
  ggtitle("Unemployment benefits series comparison")

plot of chunk unnamed-chunk-14

Chaining recent data with historical

We now fetch the remaining series from DBnomics, none of the remaining series has to be treated before it can be used. We then include in the resulting dataframe the series of social contributions by contributors.

# List of var that can be taken on the first dataset
var_taken <- c('P3',                   # Public consumption
               'P51G',                 # Public gross fixed capital formation
               'D62PAY',               # Social transfers
               'D1PAY',                # Compensation of employees
               'D3PAY',                # Subsidies
               'D2REC',                # Indirect taxes on production
               'D5REC',                # Direct taxation on income and wealth
               'D41PAY',               # Interest payments
               'TE',                   # Total expenditures
               'TR')                   # Total revenue

# We build the URL, fetch the data and convert it to a data frame
url_variables <- paste0(var_taken, collapse="+")
filter <- paste0('Q.MIO_EUR.NSA.S13.', url_variables,'.EA19')
data_1 <- rdb("Eurostat","gov_10q_ggnfa",mask=filter)

# Government consolidated gross debt is in a different dataset so we make a second call to DBnomics
data_2 <- rdb("Eurostat","gov_10q_ggdebt",mask="Q.GD.S13.MIO_EUR.EA19")

# We then bind the two data frame fetched on DBnomics
recent_data <- 
  bind_rows(data_1, data_2) %>%
  transmute(value, period, var= as.factor(na_item))

# Used to harmonize DBnomics series var names with PPP
var_names <- c('pubcons', 'pubinves', 'tfs', 'salaries', 'subs', 'indirtax', 'dirtax','intpay','totexp','totrev','debt')
recent_data$var <- plyr::mapvalues(recent_data$var,c(var_taken,'GD'),var_names)

# We include the series of social contributions
var_names <- c(var_names, "sce", "scr","unemp")
recent_data %<>% bind_rows(qsce,qscr,unemp_q)

We can check the last available date for each series.

All that is left to do is to chain the dataframe of recent series with the historical database of Paredes et al. (2014). Once the data is chained we use the seasonal package to remove the seasonal component of each series. Hereafter, we will present the treatment on each variable to check graphically that what we obtain is consistent.

chained_data <- 
  recent_data %>%
  chain(basis=.,
        to_rebase = filter(ppp, var %in% var_names), 
        date_chain = "2007-01-01") %>%
  arrange(var, period) %>%
  select(period, var, value) %>%
  mutate(Origin = "Chained")

to_deseason <- 
  chained_data %>%
  select(-Origin) %>% 
  spread(var, value)

deseasoned <- 
  bind_rows(lapply(unique(chained_data$var), 
                   function(var) deseason(var_arrange = var,
                                          source_df = to_deseason))) %>%
  mutate(Origin = "Final series")

ppp %<>% mutate(Origin = "Historical data")

to_plot <- bind_rows(chained_data,
                     deseasoned,
                     ppp)

Total revenue and expenditures

plot_totrev <- 
  to_plot %>%
  filter(var == "totrev",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="totrev",Origin !="Final series"), ind2="1 - Chain series"))

plot_totexp <- 
  to_plot %>%
  filter(var == "totexp",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="totexp",Origin !="Final series"), ind2="1 - Chain series"))

p1 <- ggplot(plot_totrev,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  ggtitle("Total revenue") +
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16))

p2 <- ggplot(plot_totexp,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  ggtitle("Total expenditures") +
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16))

grid.arrange(arrangeGrob(p1,p2,ncol=2))

plot of chunk unnamed-chunk-18

Public direct spending

The chained series of public consumption resembles strongly the historical series. Here our manipulation of the series allows us to create a long series without much loss.

There is on the chained series of investment a (visually) significant difference in level with the historical one. The method of chaining allows us to build a reasonable proxy for the series of public investment but at a certain loss.

plot_cons <- 
  to_plot %>%
  filter(var == "pubcons",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="pubcons",Origin !="Final series"), ind2="1 - Chain series"))

plot_inves <- 
  to_plot %>%
  filter(var == "pubinves",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="pubinves",Origin !="Final series"), ind2="1 - Chain series"))

p1 <- ggplot(plot_cons,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  ggtitle("Public consumption") +
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16))

p2 <- ggplot(plot_inves,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  ggtitle("Public Investment") +
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16))

grid.arrange(arrangeGrob(p1,p2,ncol=2))

plot of chunk unnamed-chunk-19

Specific spending

Both chaining seem to be consistent with the historical series. Here our manipulation does not entail much loss.

plot_salaries <- 
  to_plot %>%
  filter(var == "salaries",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="salaries",Origin !="Final series"), ind2="1 - Chain series"))

plot_subs <- 
  to_plot %>%
  filter(var == "subs",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="subs",Origin !="Final series"), ind2="1 - Chain series"))

p1 <- ggplot(plot_salaries,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  ggtitle("Compensation of employees") +
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16))

p2 <- ggplot(plot_subs,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  ggtitle("Subsidies") +
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16))

grid.arrange(arrangeGrob(p1,p2,ncol=2))

plot of chunk unnamed-chunk-20

Taxes

Both chaining seem to be consistent with the historical series. Here our manipulation does not entail much loss.

plot_indir <- 
  to_plot %>%
  filter(var == "indirtax",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="indirtax",Origin !="Final series"), ind2="1 - Chain series"))

plot_dir <- 
  to_plot %>%
  filter(var == "dirtax",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="dirtax",Origin !="Final series"), ind2="1 - Chain series"))

p1 <- ggplot(plot_indir,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16)) +
  ggtitle("Indirect taxation")

p2 <- ggplot(plot_dir,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16)) +
  ggtitle("Direct taxation")

grid.arrange(arrangeGrob(p1,p2,ncol=2))

plot of chunk unnamed-chunk-21

Debt and interest payments

The chained series of general government debt deviates slightly from the historical one, but the deviation is very thin and both chaining seem consistent. Here the seasonality of both series is weaker and the final series resemble strongly the chained ones.

plot_debt <- 
  to_plot %>%
  filter(var == "debt",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="debt",Origin !="Final series"), ind2="1 - Chain series"))

plot_intpay <- 
  to_plot %>%
  filter(var == "intpay",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="intpay",Origin !="Final series"), ind2="1 - Chain series"))

p1 <- ggplot(plot_intpay,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16)) +
  ggtitle("Interest payments")

p2 <- ggplot(plot_debt,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16)) +
  ggtitle("General government debt")

grid.arrange(arrangeGrob(p1,p2,ncol=2))

plot of chunk unnamed-chunk-22

Total social transfers and unemployment benefits

plot_unemp <- 
  to_plot %>%
  filter(var == "unemp",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="unemp",Origin !="Final series"), ind2="1 - Chain series"))

plot_transf <- 
  to_plot %>%
  filter(var == "tfs",
         Origin != "Historical data") %>%
  mutate(ind2 = "2 - Remove seasonal component") %>%
  bind_rows(data.frame(filter(to_plot,var=="tfs",Origin !="Final series"), ind2="1 - Chain series"))

p1 <- ggplot(plot_transf,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16)) +
  ggtitle("Social transfers")

p2 <- ggplot(plot_unemp,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~ind2,scales = "free_y",ncol = 1)+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL) + 
  theme(legend.title=element_blank()) +
  theme(strip.text = element_text(size=12)) +
  theme(plot.title = element_text(size=16)) +
  ggtitle("Unemployment benefits")

grid.arrange(arrangeGrob(p1,p2,ncol=2))

plot of chunk unnamed-chunk-23

Building the final database

Comparing the obtained series with PPP

We want to check that the final database we created resembles the seasonally adjusted one of Paredes et al. (2014).

url <- "PPP_seasonal.xls"

pppSA <- read_excel(url, skip = 1)

pppSA %<>%
  transmute(period     =as.Date(as.yearqtr(`MILL. EURO, RAW DATA, SEASONALLY ADJUSTED, SMOOTHED ESTIMATES`, format="%YQ%q")),
            totexp   =TOE,               # Total expenditures
            pubcons  =GCN,               # General government consumption expenditure
            pubinves =GIN,               # General government investment
            tfs      =THN,               # Social payments
            salaries =COE,               # Compensation of employees
            subs     =SIN,               # Subsidies
            unemp    =`of whichUNB`,     # Unemployment benefits (among social payments)
            intpay   =INP,               # General government interest payments
            totrev   =TOR,               # Total revenue
            indirtax =TIN,               # Total indirect taxes
            dirtax   =DTX,               # Total direct taxes
            scr      =as.numeric(SCR),   # Social contribution by employers
            sce      =as.numeric(SCE),   # Social contribution by employees and self-employed
            debt     =MAL) %>%           # Euro area general government debt
  filter(!is.na(period))

plot_compare <- 
  gather(pppSA, var, value, -period, convert= TRUE) %>%
  na.omit() %>%
  mutate(Origin="ppp") %>%
  bind_rows(deseasoned) %>%
  mutate(var=as.factor(var))

xlab_plot <- c('Euro Area government debt',
               'Direct taxes',
               'Indirect taxes',
               'Interest payments',
               'Public consumption',
               'Public investment',
               'Compensation of employees',
               "Households' social contribution",
               "Employers' social contribution",
               'Subsidies',
               'Social transfers',
               'Total expenditures',
               'Total revenues',
               'Unemployment benefits')

plot_compare$var <- plyr::mapvalues(plot_compare$var,levels(plot_compare$var),xlab_plot)

ggplot(plot_compare,aes(period,value,colour=Origin))+
  geom_line()+
  facet_wrap(~var,ncol=3,scales = "free_y")+
  scale_x_date(expand = c(0.01,0.01)) +
  theme + xlab(NULL) + ylab(NULL)+
  theme(legend.title=element_blank()) +
  theme(strip.text=element_text(size=10),
        axis.text=element_text(size=9))+
  ggtitle("Fiscal data for the Euro Area")

plot of chunk unnamed-chunk-24

Final fiscal database for the Euro area

We eventually want to build a database close to Paredes et al. (2014). You can download all the raw series here.

EA_Fipu_rawdata <-
  deseasoned %>%
  select(-Origin) %>% 
  spread(var,value)

EA_Fipu_rawdata %>% 
  write.csv(file = "EA_Fipu_rawdata.csv",row.names = FALSE)

Then data are normalized by capita and price if needed, using data built to reproduce the Smets and Wouters (2003).

sw03 <- 
  read.csv("http://shiny.nomics.world/data/EA_SW_rawdata.csv") %>%
  mutate(period=ymd(period)) %>% 
  filter(period >="1980-01-01")

EA_Fipu_data <- 
  EA_Fipu_rawdata %>% 
  inner_join(sw03,by="period") %>% 
  transmute(period=period,
            pubcons_rpc  = 100*1e+6*pubcons/(defgdp*pop*1000),
            pubinves_rpc = 100*1e+6*pubinves/(defgdp*pop*1000),
            salaries_rpc = 100*1e+6*salaries/(defgdp*pop*1000),
            subs_rpc     = 100*1e+6*subs/(defgdp*pop*1000),
            dirtax_rpc   = 100*1e+6*dirtax/(defgdp*pop*1000),
            indirtax_rpc = 100*1e+6*indirtax/(defgdp*pop*1000),
            tfs_rpc      = 100*1e+6*tfs/(defgdp*pop*1000),
            sce_rpc      = 100*1e+6*sce/(defgdp*pop*1000),
            scr_rpc      = 100*1e+6*scr/(defgdp*pop*1000),
            debt_rpc     = 100*1e+6*debt/(defgdp*pop*1000))

EA_Fipu_data %>%
  write.csv(file = "EA_Fipu_data.csv",row.names = FALSE)

You can download ready-to-use (normalized) data for the estimation here.

Appendix

Chaining function

To chain two datasets, we build a chain function whose input must be two dataframes with three standard columns (period, var, value). It returns a dataframe composed of chained values, ie the dataframe “to rebase” will be chained on the “basis” dataframe.

More specifically, the function :

  • computes the growth rates from value in the dataframe of the 1st argument
  • multiplies it with the value of reference chosen in value in the dataframe of the 2nd argument
  • at the date specified in the 3rd argument.
chain <- function(to_rebase, basis, date_chain="2000-01-01", is_basis_the_recent_data=TRUE) {

  date_chain <- as.Date(date_chain, "%Y-%m-%d")

  valref <- basis %>%
    filter(period == date_chain) %>%
    transmute(var, value_ref = value) 
  # If chain is to update old values to match recent values
  if (is_basis_the_recent_data) { 
    res <- to_rebase %>%
      filter(period <= date_chain) %>%
      arrange(desc(period)) %>%
      group_by(var) %>% 
      mutate(growth_rate = c(1, value[-1]/lag(x = value)[-1])) %>%
      full_join(valref, by=c("var")) %>%
      group_by(var) %>%
      transmute(period, value=cumprod(growth_rate)*value_ref) %>%
      ungroup() %>%
      bind_rows(filter(basis, period>date_chain))
  } else {
    # If chain is to update recent values to match old values
    res <- to_rebase %>%
      filter(period >= date_chain) %>%
      arrange(period) %>%
      group_by(var) %>%
      mutate(growth_rate = c(1, value[-1]/lag(x = value, n = 1)[-1])) %>%
      full_join(valref, by=c("var")) %>%
      group_by(var) %>%
      transmute(period, value=cumprod(growth_rate)*value_ref) %>%
      ungroup() %>%
      bind_rows(filter(basis, period<date_chain))
  }

  return(res)
}

Seasonal adjustment

For the seasonal adjustment we just used a function to mutate the series into a time series, apply the function from the Sax (2016) package, mutate back into a dataframe.

deseason <- function(source_df=data_large, var_arrange="pubcons", ...){

local_data <- source_df

detrend <- local_data[var_arrange] %>%
  ts(start=c(1980,1), frequency = 4) %>%
  seas(na.action=na.exclude)

res <- as.numeric(final(detrend))

res <-  data_frame(value=res,
             period=local_data$period,
             var=var_arrange) %>%
    arrange(period) %>%
    transmute(period, var, value)
}

Bibliography

Joan Paredes, Diego J. Pedregal, and Javier J. Pérez. Fiscal policy analysis in the euro area: expanding the toolkit. Journal of Policy Modeling, 36(5):800 – 823, 2014. URL: http://www.sciencedirect.com/science/article/pii/S0161893814000702, doi:https://doi.org/10.1016/j.jpolmod.2014.07.003. 1 2 3 4 5

Christoph Sax. seasonal: R Interface to X-13-ARIMA-SEATS. 2016. R package version 1.2.1. URL: https://CRAN.R-project.org/package=seasonal. 1 2

F. Smets and R. Wouters. An estimated dynamic stochastic general equilibrium model of the euro area. Journal of the European Economic Association, 2003.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2016. URL: https://www.R-project.org.

RStudio Team. RStudio: Integrated Development Environment for R. RStudio, Inc., Boston, MA, 2016. URL: http://www.rstudio.com/.

To leave a comment for the author, please follow the link and comment on their blog: Macroeconomic Observatory - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)