Some Papua New Guinea data doodles by @ellis2013nz
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I wrote a review last week of Struggle, reform, boom and bust: an economic history of Papua New Guinea since independence by Stephen Howes and others. The review was published in the Devpolicy Blog at The Australian National University’s Development Policy Centre in the Crawford School of Public Policy. My post today is teasing out a few data-related issues I thought about and explored while reading the book and writing that review.
What is ‘real’?
First, consider this chart of real gross domestic product (GDP) per person, which differs a bit from the one I included in my review:
In fact, it’s closer to a chart used by PNG’s Treasurer Ian Ling-Stuckey when he launched the book at University of PNG on 20 August. The Treasurer chose to use consumer price index (CPI) rather than the GDP deflator to make prices comparable over time. His rationale was to highlight changes in living standards for ordinary Papua New Guineans. He also dropped the full GDP series, to focus on non-resources GDP, for the same reason. These are both reasonable choices. Their combined impact is to make greatly visible the decline in average living standards over the independence period, from around 11,000 kina per person in 2024 prices 50 years ago, to 8,500 today during the “slow bust” period identified in the book in question.
That’s right, the economic well-being of the average Papua New Guinean, measured in terms of what they can buy with their ‘share’ of the country’s GDP, is more than 20% lower today than it was at independence 50 years ago.
Here is the chart I actually included in the book review, which is essentially the same as Figure 1.2 in the book other than some small aesthetic improvements. It uses the GDP deflator, and it’s noticeable that there has been less inflation by this measure (and hence GDP per person looks to have increased over the period, rather than declined substantially).
For the choice of CPI versus GDP deflator, it is all about the basket of goods used to make up the price index—what the average household consumes, or what is produced in the country. In an earlier 2022 work, Howes explicitly suggested using CPI as a deflator when interested in how much consumers can buy with their notional share of GDP, and the GDP deflator when looking to compare the value of what is produced in the country. For a general discussion of economic well-being in Papua New Guinea which is what the Treasurer wanted in August, I agree with the use of CPI. In my review I went with the GDP deflator instead, only because that meant I was effectively replicating a chart from the book; I didn’t want to go too far into my own analysis.
The issue of GDP or non-resource GDP is an interesting one, and is extensively discussed in the book. I chose to include them both because the divergence, particularly during the latest “bust” period, is of importance in its own right. Economic activity that is indirectly caused by resources is still included in non-resource GDP (for example, government activity funded with taxes; and value added in domestic industries that workers from the resource industry purchase goods and services from). Howes et al would have used Gross National Income or some variant of it if they could but it isn’t available.
Any other discussion points related to this chart are just some finer points of chart polishing: my decision to use grey background rectangles rather than vertical lines (which are more cluttering, in my view) to distinguish the four phases; where to include the label annotations of each phase; and the use of coloured direct labels on the two time series instead of a more conventional (but more effort for the reader) legend.
One of the great things about this book is that all the economic time-series data behind it has been published, and is kept up to date by ANU, as the PNG Economic Database.
Data availability is a major problem for Papua New Guinea. First and foremost in the problems is its existence in the first place; I’ve already noted that we don’t have a Gross National Income measure, and other serious gaps include a household income and expenditure survey that could be used to measure poverty and food security, and a labour force survey for understanding employment. Efforts are under way to improve all this. But even when it exists it can be hard to find, even more so for historical data. The ANU’s PNG Economic Database is a brilliant public-spirited response to this aspect of the problem, drawing together what economic time series are available into one spot.
Note that the ‘about’ information is a bit out of date for this database; data is now more up to date than is claimed (for example, population goes up to 2024, but the ‘about’ only claims it goes up to 2021).
Anyway, the existence of this database, which can be accessed as a Tableau interactive tool or downloaded in bulk as a CSV, is what makes it possible for us to re-create charts like the above, using the same data as the book. Here’s the R code for drawing the first of the two GDP charts above. The code for the second chart is omitted here but can be found on GitHub.
library(tidyverse)
library(spcstyle)
library(scales)
library(ggtext)
library(RColorBrewer)
library(rsdmx)
#---------------download data, set up palette---------------
# Read in the ANU's PNG economic database. Download from 
# https://pngeconomic.devpolicy.org/
pnged <- read_csv("PNG economic database.csv")
# era_cols <- brewer.pal(6, "Set1")[1:4]
era_cols <- c("grey10", "white", "grey10", "white")
gdp_cols <- brewer.pal(7, "Set1")[c(5,7)]
#-------------------------CPI so we see prices facing consumers---------
 # ratio of CPI to non-resource GDP deflator
 cpi_def <- pnged |> 
   filter(Variable%in% c("Non-resource GDP deflator", "CPI deflator")) |> 
   select(Variable, Year, Amount) |> 
   spread(Variable, Amount) |> 
   # rebase to a set year:
   mutate(across(`CPI deflator`:`Non-resource GDP deflator`, 
                 function(x){x / x[Year == 1990]})) |> 
   mutate(ratio = `CPI deflator` / `Non-resource GDP deflator`) 
 
 # from 1990 to 2022, CPI has increased about 40% more than the GDP deflator
 # so if you want to see the living standards of PNGans, there is a case to use
 # the CPI instead
 
 # draw plot:
pnged |> 
   filter(Variable %in% c("Non-resource GDP (current prices, new series)", 
                          "GDP (current prices, new series)", "Population")) |> 
   select(Variable, Year, Amount) |> 
   spread(Variable, Amount) |> 
   mutate(nr_gdp_pp = `Non-resource GDP (current prices, new series)` / Population * 1e6,
          gdp_pp = `GDP (current prices, new series)` / Population * 1e6 ) |> 
   select(Year, nr_gdp_pp, gdp_pp) |> 
   gather(variable, value, -Year) |> 
   drop_na() |>
   left_join(cpi_def, by = "Year") |> 
   mutate(value = value  / `CPI deflator` * filter(cpi_def, Year == 2024)$`CPI deflator`) |> 
   ggplot(aes(x = Year, y = value, colour = variable)) +
   annotate("rect", xmin = 1975, xmax = 1988.5, ymin = -Inf, ymax = Inf, fill = era_cols[1], alpha = 0.1) +
   annotate("rect", xmin = 1988.5, xmax = 2003.5, ymin = -Inf, ymax = Inf, fill = era_cols[2], alpha = 0.1) +
   annotate("rect", xmin = 2003.5, xmax = 2013.5, ymin = -Inf, ymax = Inf, fill = era_cols[3], alpha = 0.1) +
   annotate("rect", xmin = 2013.5, xmax = 2022.5, ymin = -Inf, ymax = Inf, fill = era_cols[4], alpha = 0.1) +
   geom_line(linewidth = 2) +
  # option, can uncomment this and you get a point showing each observation. 
  # It is helpful to see the actual point, but adds clutter.
  #   geom_point(colour = "white") +
   annotate("text", label = c("'Struggle'", "'Reform'", "'Boom'", "'Bust'"), y = 14100, 
            x = c(1981.5, 1996, 2009, 2018), hjust = 0.5, fontface = 4, alpha = 0.8) +
   annotate("text", colour = gdp_cols, x = 2020, y = c(10200, 7600), 
            label = c("All GDP", "Non-resources GDP")) +
   scale_colour_manual(values = gdp_cols) +
   scale_y_continuous(label = comma, breaks = 6:14 * 1000) +
   labs(y = "Kina (2024 prices, based on CPI deflator)",
        x = "",
        title = "Real gross domestic product per person in Papua New Guinea",
        subtitle = "Annotated with the periods used in <i>Struggle, reform, boom and bust: an economic history of Papua New Guinea since independence</i>",
        caption = "Source: ANU's PNG Economic Database, https://pngeconomic.devpolicy.org/")  +
   theme(legend.position ="none",
         plot.subtitle = element_markdown())Population
The most fundamental national statistic is always population, and unfortunately for PNG there is more than usual uncertainty about how many people live in the country. A census in PNG—with its geographical, linguistic, cultural, political and security challenges—is one of the harder exercises in official statistics collection anywhere in the world. The key facts regarding population estimates there are:
- Birth and death registration has insufficient coverage to estimate death rates. Instead these have to be estimated by survey or census questions such as “woman X in this household, has she given birth in the past 12 months; and if so is the child still alive?”, which can then be matched with model life tables.
- The 2024 Census, delayed from 2021 because of Covid and cut down to a minimalist six questions per household (which do not include questions like the example above, but do include at least sex and age), is due to report soon.
- The 2011 Census (the report of which is here) has been criticised and its population estimates are regarded by many (for example, in the Struggle, reform, boom and bust book) as unfit for use.
- The 2000 Census is often referred to as “the last credible population estimate” or in similar terms. The population estimates in the ANU PNG Economic Database take the 2000 Census population as a reference point and assume a steady growth rate from that point on.
- An exercise by the national statistics office and WorldPop in 2023 published an estimated population for 2021 (as at early September 2025 the results were on the official PNG statistics website). A statistical learning model was trained on satellite imagery with a malaria survey to provide the “ground truth” population. The result (11.7m in 2021) was high by existing standards, but not out of the possible range.
- Various modelled estimates exist, drawing on some or all of the above (plus earlier censuses) in differing ways.
The state of play with regard to PNG’s population estimates is represented in the following chart:
At the time of writing (early September 2025), the Census and WorldPop point estimates are the official statistics on the PNG National Statistics Office website. The UN Population Projections, which are re-disseminated via the Pacific Community’s Pacific Data Hub, and the population estimates in the ANU PNG Economic Database, are shown as smooth lines.
The growth between 2011 and 2021 implied by accepting both the 2011 Census and 2021 WorldPop estimates is implausibly rapid (4.9% per year). However, there is no way of determining if the 2011 figure is an undercount, the 2021 an overestimate, or both.
The ANU’s estimate—which ultimately go back to modelling efforts by Bourke and Allen in 2021—are probably a little low, a point made by the Treasurer when he launched the book. But again, not out of the plausible range.
Naturally, all this uncertainty feeds through to other statistics: the denominator for GDP per capita, enrolment rates, etc.; and the construction of sampling frames and survey weights for population surveys.
For improving this, a lot depends on getting reliable Census data.
Here’s the code for creating that population estimates graphic:
#--------------population, comparison of data sources--------------
pop_anu <- pnged |> 
  filter(Variable %in% c("Population")) |> 
  select(Year, population = Amount)
pop_pdh <- readSDMX("https://stats-sdmx-disseminate.pacificdata.org/rest/data/SPC,DF_POP_PROJ,3.0/A.PG.MIDYEARPOPEST._T._T?startPeriod=1975&endPeriod=2025&dimensionAtObservation=AllDimensions") |> 
  as_tibble() |> 
  select(year = TIME_PERIOD,
         `UN method` = obsValue) |> 
  mutate(year = as.numeric(year))
# sources:
# https://png-data.sprep.org/system/files/2011%20Census%20National%20Report.pdf
# https://www.nso.gov.pg/statistics/population/ (for WorldPop, accessed 6/9/2025)
specifics <- tribble(~year, ~variable, ~value,
                      2021, "WorldPop method", 11781779,
                      2011, "Census method", 7254442 + 20882, # including both citizens and non-citizens
                      2000, "Census method", 5171548 + 19235,
                      1990, "Census method", 3582333 + 25621,
                      1980, "Census method", 2978057 + 32670) |> 
  # make WorldPop appear first in the legend, better visually:
  mutate(variable = fct_relevel(variable, "WorldPop method"))
# Draw plot
pop_anu |> 
  select(year = Year, `ANU method` = population) |> 
  full_join(pop_pdh, by = "year") |> 
  gather(variable, value, -year) |> 
  # make UN appear first in legend, better visually:
  mutate(variable = fct_relevel(variable, "UN method")) |> 
  ggplot(aes(x = year, y = value, colour = variable)) +
  geom_line(data = filter(specifics, grepl("Census", variable)), colour = "grey50", linetype = 2) +
  geom_line() +
  geom_point(data = specifics, aes(colour = NULL, shape = variable), size = 3) +
  scale_shape_manual(values = c("Census method" = 19, "WorldPop method" = 15)) +
  scale_y_continuous(label = comma) +
  labs(shape = "Single-year", colour = "Multi-year",
       x = "", y = "",
      title = "Different estimates of Papua New Guinea's population",
    subtitle = "Independence to 2025",
  caption = "Source: PNG National Statistics Office (for WorldPop); 2011 National Census Report; ANU PNG economic database; Pacific Data Hub.stat")Employment
I mentioned in my book review that formal employment is less than 5% of the total population. A more usual measure would be proportion of working age population, but I would have had to get that denominator from elsewhere and didn’t have time. Here’s the chart I drew for myself to check that this throwaway comment was justified:
What’s primarily interesting for me is the very low and declining proportion of the population in formal employment. However, it’s also interesting to note the data gaps relating to public sector employment; and the correlation of changes in total employment with the “boom” and “bust” periods that are the driver of the original book.
That chart was drawn with this code.
pnged |> 
  filter(Variable %in% c("Total (excluding public service) employment",
                         "Public service employment")) |> 
  left_join(pop_anu, by = "Year") |>
  mutate(Amount = Amount / population) |>
  mutate(Variable = fct_reorder(str_wrap(Variable, 30), Amount, .desc = TRUE)) |> 
  ggplot(aes(x = Year, y = Amount, colour = Variable)) +
  annotate("rect", xmin = 1975, xmax = 1988.5, ymin = -Inf, ymax = Inf, fill = era_cols[1], alpha = 0.1) +
  annotate("rect", xmin = 1988.5, xmax = 2003.5, ymin = -Inf, ymax = Inf, fill = era_cols[2], alpha = 0.1) +
  annotate("rect", xmin = 2003.5, xmax = 2013.5, ymin = -Inf, ymax = Inf, fill = era_cols[3], alpha = 0.1) +
  annotate("rect", xmin = 2013.5, xmax = 2022.5, ymin = -Inf, ymax = Inf, fill = era_cols[4], alpha = 0.1) +
  geom_line() +
  annotate("text", label = c("'Struggle'", "'Reform'", "'Boom'", "'Bust'"), y = 0.0585, 
            x = c(1981.5, 1996, 2009, 2018), hjust = 0.5, fontface = 4, alpha = 0.8) +
  scale_y_continuous(label = percent, limits = c(0, 0.06)) +
  labs(x = "", y = "Proportion of population",
        title = "Formal employment in Papua New Guinea",
      subtitle = "As a proportion of the population (including children and elderly)")Vaccination
Finally, I had noticed in the ANU PNG Economic Database data on vaccination, which is referred to in the book but is not yet given a source in the database documentation. There are too many observations for this to be survey data, so it must be health administrative data of some sort. I’d treat this with great caution. But the point made in the book is doubtless sound that these vaccination rates are low by world standards, and not going in the right direction:
The code to produce that chart is similar in pattern to all the code to date.
pnged |> 
  filter(grepl("Immunization", Variable)) |> 
  ggplot(aes(x = Year, y = Amount, colour = Variable)) +
  annotate("rect", xmin = 1975, xmax = 1988.5, ymin = -Inf, ymax = Inf, fill = era_cols[1], alpha = 0.1) +
  annotate("rect", xmin = 1988.5, xmax = 2003.5, ymin = -Inf, ymax = Inf, fill = era_cols[2], alpha = 0.1) +
  annotate("rect", xmin = 2003.5, xmax = 2013.5, ymin = -Inf, ymax = Inf, fill = era_cols[3], alpha = 0.1) +
  annotate("rect", xmin = 2013.5, xmax = 2022.5, ymin = -Inf, ymax = Inf, fill = era_cols[4], alpha = 0.1) +
  geom_line() +
  annotate("text", label = c("'Struggle'", "'Reform'", "'Boom'", "'Bust'"), y = 85, 
            x = c(1981.5, 1996, 2009, 2018), hjust = 0.5, fontface = 4, alpha = 0.8) +
  scale_y_continuous(label = percent_format(scale = 1)) +
  labs(x = "", y = "", colour = "",
       title = "Immunization rates in Papua New Guinea",
      subtitle = "Proportion of children 12-23 months for measles and DPT; one-year old children for HepB3. Treat data with caution.") Well that’s it for today. I just thought I’d pop some of these things into a blog while I’ve been thinking about them. I will certainly be coming back to PNG topics at some point; and of course this whole area is a substantial part of my day job.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
