Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

After some more digging, and a suggestion by @theMexIndian I decided to see more in the depth the unvotes database that I wrote about some weeks ago.

This time, amit suggested I do some hierarchical clustering of the votes. So here goes a very dirty first attempt…

## Data and setup

Nothing too impressive here… (for a discussion of the package, see the original post).

library(dplyr)
library(magrittr)
library(reshape2)

# number of roll-calls
left_join(., un_roll_calls) %>%
left_join(., un_roll_call_issues)

length(unique(votes$rcid)) # [1] 5275 # number of unique roll call votes There are more than 5k unique roll calls, so if we where to open up dimensionality by each roll call, this is going to be huge, but i’ll go ahead and do it anyways, to test a hypothesis towards the end… ## ‘Widen’ data… wide <- votes %>% select(rcid, country, vote) %>% dcast(, formula = rcid+country ~ vote) %>% dcast(, formula = country~rcid+yes+no+abstain) str(wide) # 'data.frame': 200 obs. of 14352 variables: Now that we have a very high dimension data set (each variable is the vote in a roll call, for example, abstain_120, yes_120, no_120 would be a count of abstain, yes and no votes in roll call 120). This data set is basically ones and ceros. Now to do some cleaning and get the distance matrix… wide[is.na(wide)] <- 0 d_wide <- as.matrix(wide) row.names(d_wide) <- wide$country # to name rows
d_wide <- dist(d_wide) # distance matrix
hc_wide <- hclust(d_wide) # hierarchical cluster

Let’s graph this hierarchical clustering using the ggdendro package…

library(ggdendro)
library(eem) # blog colors
ggdendrogram(hc_wide,
rotate = TRUE) +
theme_eem() +
theme(axis.text.y = element_text(size=6)) +
labs(x = "country",
y = "",
title = "Hierarchical clusters of votes \n in U.N.")

hc_c <- cutree(hc_wide, k = 8)
hc_c <- as.data.frame(hc_c, row.names = names(hc_c))
hc_c$c <- row.names(hc_c) cc <- hc_c %>% arrange(-hc_c) write.csv(as.data.frame(cc), file = "country_clusters.csv") ## By issues Now, because the latest data set was very high dimension, i’m going to condense the analysis to just votes on particular issues. The data base has seven core issues, so i’m going to try to group by issue instead of roll call. This might let us see if there are different voting blocs from the earlier set (maybe countries vote the same, except when important issues come up). # Widen, by issue... wide_byissue <- votes %>% select(issue, country, vote) %>% dcast(, formula = country ~ vote+issue) wide_byissue[is.na(wide_byissue)] <- 0 d_wide_issue <- as.matrix(wide_byissue) row.names(d_wide_issue) <- wide_byissue$country
d_wide_issue <- dist(d_wide_issue)
hc_wide_issue <- hclust(d_wide_issue)

ggdendrogram(hc_wide_issue,
rotate = TRUE) +
theme_eem() +
theme(axis.text.y = element_text(size=6)) +
labs(x = "country",
y = "",
title = "Hierarchical clusters of votes \n in U.N. (issues)")

I’ll export this too…

hc_c2 <- cutree(hc_wide_issue, k = 8)
hc_c2 <- as.data.frame(hc_c2, row.names = names(hc_c2))
hc_c2$c <- row.names(hc_c2) cc2 <- hc_c2 %>% arrange(-hc_c2) write.csv(as.data.frame(cc2), file = "country_clusters_issue.csv") To disprove the earlier hypothesis, i’m going to find Mexico’s neighborhood, and see if there are many countries that repeat themselves in both sets… # find cluster where Mexico lives ... neighborhood_mx <- hc_c %>% filter(hc_c == 3) neighborhood_mx_issue <- hc_c2 %>% filter(hc_c2 == 1) sum(neighborhood_mx_issue$c %in% neighborhood_mx$c)/length(neighborhood_mx_issue$c)
# [1] 0.8

# export mexico's neighborhood
write.csv(neighborhood_mx_issue, file = "neighborhood_mx_issue.csv")

So 80% of the country’s are “close” to Mexico whether the vote is by issue or by roll call. This is a rough first attempt (there are probably many slight errors) but there are some interesting things to be found.

In the issues groups, the outliers in a single group are the United States and Israel (the Palestinian conflict probably is the culprit here, as I found earlier, they agree on 77% of the votes).

Then there are countries that seem to be very close culturally, and they show it in the votes…

# advanced foreign policy
hc_c2 %>% filter(hc_c2 == "6")
# [1] "Austria"     "Denmark"     "Finland"     "Greece"      "Iceland"
# [6] "Ireland"     "Japan"       "New Zealand" "Norway"      "Spain"
# [11] "Sweden"

Finally, some like-minded countries, like Chile, Colombia, Panama, Paraguay, Peru, etc are in Mexico’s neighborhood (although it’s one of the largest groups).

Tweet me up if you have any questions with the data!