Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee. It shall be used to continue my Open Source efforts. The full explanation is here: A Personal Message from an Open Source Contributor.
You can send me questions for the blog using this form and subscribe to receive an email when there is a new post.
I have been working on my own research, and after three days of downloading and cleaning data from UN Comtrade Plus, I was ableto create a tidy database in Postgres.
I had to go through 65 GB of data to organize this into a something communicable.
Connecting to my local PostgreSQL database (otherwise reading and filtering this would be insane):
# remember to load postgres first library(RPostgres) con <- dbConnect( Postgres(), dbname = Sys.getenv("COMTRADE_NAME"), host = Sys.getenv("COMTRADE_HOST"), user = Sys.getenv("COMTRADE_USER"), password = Sys.getenv("COMTRADE_PASSWORD"), port = Sys.getenv("COMTRADE_PORT") ) dbListTables(con)
[1] "classification_codes" "commodities" "commodities_short" [4] "commodity_codes" "commodity_correlations" "countries" [7] "country_codes" "customs_codes" "exports" [10] "flow_codes" "gdp_deflator" "imports" [13] "mos_codes" "mot_codes" "unit_codes" [16] "yc" "yr" "yrc" [19] "yrp" "yrpc"
As you can see in the previous chunk, it is a good practice to use environment variables to store sensitive information like database credentials even for local databases with no external access (e.g., this database is stored on my laptop). I manage these credentials with usethis::edit_r_environ()
.
Now I can use the Tidyverse for efficient filtering and analysis of my data:
library(dplyr) tbl(con, "commodities") %>% glimpse()
Rows: ?? Columns: 8 Database: postgres [pacha@localhost:5432/comtrade] $ commodity_code <chr> "010121", "010129", "010130", "010190", "010221",… $ commodity_code_short <chr> "0101", "0101", "0101", "0101", "0102", "0102", "… $ commodity_name <chr> "Horses; live, pure-bred breeding animals", "Hors… $ chapter_code <chr> "01", "01", "01", "01", "01", "01", "01", "01", "… $ chapter_name <chr> "Animals; live", "Animals; live", "Animals; live"… $ section_code <chr> "01", "01", "01", "01", "01", "01", "01", "01", "… $ section_name <chr> "Live Animals; Animal Products", "Live Animals; A… $ section_color <chr> "#74c0e2", "#74c0e2", "#74c0e2", "#74c0e2", "#74c…
In the Harmonized System (HS), the international customs codes are used to classify traded products, and this dataset uses the HS revision 2022. The HS codes are constantly expanded to reflect changes in technology (e.g., the HS revision 2002 has fewer codes for “cellphones” than the HS revision 2022 that distinguished between different types of cellphones). These codes are hierarchical, the first two digits represent the chapter, and in total this data has 99 chapters with the 99th chapter being my addition covering “unknown/unspecified” products (e.g., code “999999”).
The section code comprises 21 codes plus one “unknown/unspecified” addition I made. These are also hierarchical but cannot be generalized in the same way as the chapter codes by trimming the product code. For example, section one contains chapters one to five. I had to add this with my own R scripts after finding a webpage that lists the section codes and their chapter ranges from the World Customs Organization (WCO). I had to get it from the Internet Wayback Machine as the original page was not available.
Because of how the sections are presented, I had to do this:
section_names <- str_to_title(c( "LIVE ANIMALS; ANIMAL PRODUCTS", "VEGETABLE PRODUCTS", "ANIMAL, VEGETABLE OR MICROBIAL FATS AND OILS AND THEIR CLEAVAGE PRODUCTS; PREPARED EDIBLE FATS; ANIMAL OR VEGETABLE WAXES", ... )) d <- d %>% mutate( section_name = case_when( section_code == "01" ~ section_names[1], section_code == "02" ~ section_names[2], section_code == "03" ~ section_names[3], ...
And something similar for the colours, which are my addition and correspond to 22 colours I found visually appealing.
With this in mind, it is simpler to represent 22 sections than 99 chapters visually. The sections are:
tbl(con, "commodities") %>% distinct(section_code, section_name) %>% arrange(section_code) %>% collect() %>% print(n = 22)
# A tibble: 22 × 2 section_code section_name <chr> <chr> 1 01 Live Animals; Animal Products 2 02 Vegetable Products 3 03 Animal, Vegetable Or Microbial Fats And Oils And Their Cleavage… 4 04 Prepared Foodstuffs; Beverages, Spirits And Vinegar; Tobacco An… 5 05 Mineral Products 6 06 Products Of The Chemical Or Allied Industries 7 07 Plastics And Articles Thereof; Rubber And Articles Thereof 8 08 Raw Hides And Skins, Leather, Furskins And Articles Thereof; Sa… 9 09 Wood And Articles Of Wood; Wood Charcoal; Cork And Articles Of … 10 10 Pulp Of Wood Or Of Other Fibrous Cellulosic Material; Recovered… 11 11 Textiles And Textile Articles 12 12 Footwear, Headgear, Umbrellas, Sun Umbrellas, Walking-Sticks, S… 13 13 Articles Of Stone, Plaster, Cement, Asbestos, Mica Or Similar M… 14 14 Natural Or Cultured Pearls, Precious Or Semi-Precious Stones, P… 15 15 Base Metals And Articles Of Base Metal 16 16 Machinery And Mechanical Appliances; Electrical Equipment; Part… 17 17 Vehicles, Aircraft, Vessels And Associated Transport Equipment 18 18 Optical, Photographic, Cinematographic, Measuring, Checking, Pr… 19 19 Arms And Ammunition; Parts And Accessories Thereof 20 20 Miscellaneous Manufactured Articles 21 21 Works Of Art, Collectors' Pieces And Antiques 22 99 Unknown Or Unspecified
According to Statistics Canada’s Survey of Household Spending, 2023: “Shelter accounted for 32.1% of total consumption of goods and services in 2023, followed by transportation (15.8%) and food (15.7%), which remained the three largest spending categories.” Because of this, let’s focus solely on sections one to four.
How much do American products represented from Canadian food (and food related) products in 2023? We can go to the YRPC (year-reporter-partner-commodity) table. The HS presents products as commodities, but for simplicity, I refer to them as products. This table is not a part of the official UN Comtrade data, it is my own creation with extra steps to include ISO-3 Alpha codes for countries and the section and chapter codes I mentioned previously.
can_foods <- tbl(con, "yrpc") %>% filter( year == 2023L, partner_iso == "CAN", section_code %in% c("01", "02", "03", "04") ) %>% mutate( exporter = case_when( reporter_iso == "USA" ~ "United States", TRUE ~ "Rest of the World" ) ) %>% group_by(exporter) %>% summarise( imports = sum(trade_value_usd_exp, na.rm = TRUE) ) %>% ungroup() %>% mutate( share = imports / sum(imports) * 100 ) %>% collect() can_foods
# A tibble: 2 × 3 exporter imports share <chr> <dbl> <dbl> 1 Rest of the World 44360282786. 57.4 2 United States 32885888490 42.6
In the previous chunk I used the exporter-based figures to exclude the cost of freight and insurance from the importer-based values. This gives a clearer picture of the actual trade flows between the countries involved without considering logistics costs.
We can refine this a bit more by grouping by section code as well:
can_foods_refined <- tbl(con, "yrpc") %>% filter( year == 2023L, partner_iso == "CAN", section_code %in% c("01", "02", "03", "04") ) %>% mutate( exporter = case_when( reporter_iso == "USA" ~ "United States", TRUE ~ "Rest of the World" ) ) %>% inner_join( tbl(con, "commodities") %>% select(commodity_code, section_code, section_name, section_color) ) %>% group_by(exporter, section_name, section_color) %>% summarise( imports = sum(trade_value_usd_exp, na.rm = TRUE) ) %>% ungroup() %>% group_by(section_name, section_color) %>% mutate( share = imports / sum(imports) * 100 ) %>% ungroup() %>% arrange(section_name, exporter) %>% collect() can_foods_refined
# A tibble: 8 × 5 exporter section_name section_color imports share <chr> <chr> <chr> <dbl> <dbl> 1 Rest of the World Animal, Vegetable Or Microbial … #549e95 1.85e 9 66.8 2 United States Animal, Vegetable Or Microbial … #549e95 9.19e 8 33.2 3 Rest of the World Live Animals; Animal Products #74c0e2 6.13e 9 60.7 4 United States Live Animals; Animal Products #74c0e2 3.97e 9 39.3 5 Rest of the World Prepared Foodstuffs; Beverages,… #8abdb6 2.57e10 60.0 6 United States Prepared Foodstuffs; Beverages,… #8abdb6 1.71e10 40.0 7 Rest of the World Vegetable Products #406662 1.07e10 49.7 8 United States Vegetable Products #406662 1.09e10 50.3
A plot of the previous table can show how much of the food products (and food related) imported by Canada come from the United States:
library(stringr) library(ggplot2) # trim section name to 40 characters can_foods_refined <- can_foods_refined %>% mutate(section_name = paste0(str_sub(section_name, 1, 40), "...")) ggplot(can_foods_refined, aes(x = exporter, y = share, fill = section_color)) + geom_col(width = 0.7) + facet_wrap(~ section_name, ncol = 1) + scale_fill_identity() + labs( title = "Share of Food Products Imported by Canada (2023)", subtitle = "Source: Own creation based on UN Comtrade data", x = "", y = "Share (%)" ) + theme_minimal(base_size = 13) + theme( axis.text.x = element_text(angle = 30, hjust = 1), legend.position = "none" )
The reciprocal tariffs of 25% applied to some animal and vegetable products effective as of September 12, 2025 (Source: Government of Canada) are directly affecting Canadian households’ pocket by a rough magnitude of an extra 25% x 15.7% x 60% ~= 2.5% on food products alone. Food is now 25% x 60% ~= 15% more expensive for Canadian families. This is a significant impact on the cost of living for many families if we start adding other affected products into the mix and the additional impact on employment as these tariffs also affect steel and other Canadian industries. Of course, this ignores general equilibrium effects such as trade diversion and changes in other exporters and consumer behaviour.
If you read up to here, I have put my international trade data for sale, it covers 1988-2023 for all countries at commodity level (HS 6 digits) and is already cleaned and organized in a PostgreSQL database. You can get 65 GB of clean data from Buy me a Coffee with documentation and support.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.