September 22, 2012
By

(This article was first published on StaTEAstics., and kindly contributed to R-bloggers)

This week,  I got my hands on some agricultural trade data. Trade data are typically extremely dirty so treat with care when you get your hands on them. Lab standard equipments are required.

So I decided to look how countries trade by plotting the network  (The data is confidential so I would not disclose the country nor the commodity).

library(XML)
library(reshape)
library(igraph)

## Create the graph
net.mat = as.matrix(t1001.df)
net.g = graph(t(unique(net.mat[, 1:2])))

## Delete vertices with no edge and set edge with proportional to the side of trade
full.g = delete.vertices(net.g, which(degree(net.g) == 0))
E(full.g)\$width = scale(net.mat[, 3])

net.g = graph(t(unique(net.mat[, 1:2])))
full.g = delete.vertices(net.g, which(degree(net.g) == 0))

## Change arrow size according to trade volume
E(full.g)\$width = net.mat[, 3]/sum(net.mat[, 3])
E(full.g)\$width[E(full.g)\$width < 0.05] = 0.05
E(full.g)\$width = E(full.g)\$width * 20

## Compute the size of exporting vertex
sum.df = with(t1001.df, aggregate(TradeValue, list(rtCode), sum))

## Change size and color of exporting country
V(full.g)\$size = 8
V(full.g)\$size[c(6, 10, 13, 28, 43)] =
((sum.df[, “x”]/sum(sum.df[, “x”])) – min(sum.df[, “x”]/sum(sum.df[, “x”]))) *
15 + 8
V(full.g)\$color = “lightblue”
V(full.g)\$color[c(6, 10, 13, 28, 43)] = “steelblue”

## Plot the network

set.seed(587)
plot(full.g, edge.arrow.size = 0.3, edge.curved = TRUE,
vertex.label.color = “black”)

The exporters are coloured in dark blue while the importers in light blue. The width of the connection is proportional to the amount of trading between the countries.

Looking at the plot one can easily identify that country 43, 13, are major exporters while country 7 and 37 are major importers. These information can be easily extracted with some simple analysis, however, there are some subtle points which are a little bit hard to identify without a network diagram.

(1) There are clear cluster relationships, certain countries only import from either 43, 13, or 28 while some import from more than one. There could be certain cost/logistic/trade/geographical reasons for this kind of pattern.

(2) Country 10 is isolated meaning that there are no trading between the rest of the world!

The network reveal some subtle information very quickly and is a very good exploratory tool for trade data.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...