June 27, 2018
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the introduction of my course next week, I will (briefly) mention networks, and I wanted to provide some illustration of the Friendship Paradox. On network of thrones (discussed in Beveridge and Shan (2016)), there is a dataset with the network of characters in Game of Thrones. The word “friend” might be abusive here, but let’s continue to call connected nodes “friends”. The friendship paradox states that

People on average have fewer friends than their friends

This was discussed in Feld (1991) for instance, or Zuckerman & Jost (2001). Let’s try to see what it means here. First, let us get a copy of the dataset

 1 2 3 4  download.file("https://www.macalester.edu/~abeverid/data/stormofswords.csv","got.csv") GoT=read.csv("got.csv") library(networkD3) simpleNetwork(GoT[,1:2])

Because it is difficult for me to incorporate some d3js script in the blog, I will illustrate with a more basic graph,

Consider a vertex $v\in V$ in the undirected graph $G=(V,E)$ (with classical graph notations), and let $d(v)$ denote the number of edges touching it (i.e. $v$ has $d(v)$ friends). The average number of friends of a random person in the graph is $$\mu = \frac{1}{n_V}\sum_{v\in V} d(v)=\frac{2 n_E}{n_V}$$ The average number of friends that a typical friend has is
$$\frac{1}{n_V}\sum_{v\in V} \left(\frac{1}{d(v)}\sum_{v'\in E_v} d(v')\right)$$But
$$\sum_{v\in V} \left(\frac{1}{d(v)}\sum_{v’\in E_v} d(v’)\right)=\sum_{v,v’ \in G} \left( \frac{d(v’)}{d(v)}+\frac{d(v)}{d(v’)}\right)$$$$=\sum_{v,v' \in G}\left(\frac{d(v')^2+d(v)^2}{d(v)d(v')}\right)=\sum_{v,v' \in G} \left(\frac{(d(v')-d(v))^2}{d(v)d(v')}+2\right){\color{red}{\succ}}\sum_{v,v' \in G} \left(2\right)=\sum_{v\in V} d(v)$$
Thus,$$\frac{1}{n_V}\sum_{v\in V} \left(\frac{1}{d(v)}\sum_{v'\in E_v} d(v')\right)\succ \frac{1}{n_V}\sum_{v\in V} d(v)$$
Note that this can be related to the variance decomposition $$\text{Var}[X]=\mathbb{E}[X^2]-\mathbb{E}[X]^2$$i.e.$$\frac{\mathbb{E}[X^2]}{\mathbb{E}[X]} =\mathbb{E}[X]+\frac{\text{Var}[X]}{\mathbb{E}[X]}\succ\mathbb{E}[X]$$(Jensen inequality). But let us get back to our network. The list of nodes is

 1 2  M=(rbind(as.matrix(GoT[,1:2]),as.matrix(GoT[,2:1]))) nodes=unique(M[,1])

and we each of them, we can get the list of friends, and the number of friends

 1 2  friends = function(x) as.character(M[which(M[,1]==x),2]) nb_friends = Vectorize(function(x) length(friends(x)))

as well as the number of friends friends have, and the average number of friends

 1 2  friends_of_friends = function(y) (Vectorize(function(x) length(friends(x)))(friends(y))) nb_friends_of_friends = Vectorize(function(x) mean(friends_of_friends(x)))

We can look at the density of the number of friends, for a random node,

 1 2 3 4 5 6  Nb = nb_friends(nodes) Nb2 = nb_friends_of_friends(nodes) hist(Nb,breaks=0:40,col=rgb(1,0,0,.2),border="white",probability = TRUE) hist(Nb2,breaks=0:40,col=rgb(0,0,1,.2),border="white",probability = TRUE,add=TRUE) lines(density(Nb),col="red",lwd=2) lines(density(Nb2),col="blue",lwd=2)

and we can also compute the averages, just to check

 1 2 3 4  mean(Nb) [1] 6.579439 mean(Nb2) [1] 13.94243

So, indeed, people on average have fewer friends than their friends.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.