Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
This post is based on the Six Degrees of Kevin Bacon which itself is an adoption of the Erdős number in math. Readers familiar with the concepts can skip the following paragraph and go directly to the calculation of the Zlatan number.
I have done this before on my old blog,
but I felt like redoing the analysis.
What is an Erdős or Bacon number?
Paul Erdős wrote an incredible amount of scientific papers (around 1500) with around 500 collaborators.
Due to his prolific output, the Erdős Number was created. An Erdős number describes a
person’s degree of separation from Erdős himself, based on their collaboration with him,
or with collaborators of Erdős. Erdős is the only one with an Erdős number of zero, while
his immediate collaborators have an Erdős number of one, their collaborators have Erdős number at most two.
The concept of the Bacon number is basically the same. Similar to Erdős, Kevin Bacon has
quite a prolific screen career such that many actors can be linked by cooccurrences
to Kevin Bacon. Kevin Bacon is the only one with a Bacon number of 0. Actors
occurring in a movie with Bacon have a Bacon number of 1 and so on.
The numbers can also be aggregated to form the ErdősBacon number. To obtain an ErdősBacon number, you either have to be an acting scientist or an actor who wrote a scientific paper. According to Wikipedia the smallest existing sum of both numbers is three^{1}.
The ErdősBaconSabbath number seems to be the
craziest extension of this number game.
Natalie Portman, for example, has an ErdősBaconSabbath number of 11.
The Zlatan Number
Zlatan’s career stats are impressive. He has played for most top tier clubs in Europe.
As such, who else than him would qualify as the Erdös or Bacon of football?
I scraped all available squads from top tier clubs since 2001 from footballsquads^{2}. If you are interested in the R code,
I will gladly provide it.
The data we are working with looks as follows.
## Observations: 208,774
## Variables: 8
## $ unique_player "Martin Bernachia_100377", "Ruben Cordoba_060...
## $ name "Martin Bernachia", "Ruben Cordoba", "Silvio Dua...
## $ pos "G", "D", "D", "D", "M", "D", "M", "M", "F", "M"...
## $ date_of_birth "100377", "060572", "080678", "120975", ...
## $ unique_club "almagro_20042005_arg", "almagro_20042005_arg"...
## $ club "almagro", "almagro", "almagro", "almagro", "alm...
## $ season "20042005", "20042005", "20042005", "2004200...
## $ country "arg", "arg", "arg", "arg", "arg", "arg", "arg",...
The columns unique_player
and unique_club
were created to separate players with the same name^{3} and distinguish between squads of clubs for different seasons.
From this data frame, we create a bipartite graph. The set of nodes contains all unique players and
all unique clubs. A player and a club are linked, if the player was in the squad in the given season.
nodes < tibble(name=c(df$unique_player,df$unique_club),
type=rep(c(T,F),each=nrow(df))) %>% distinct()
#bipartite network
g < graph_from_data_frame(select(df,unique_player,unique_club),
directed=F,vertices=nodes)
#bipartite projection on players
pl < bipartite_projection(g,which="true",multiplicity = F)
#get biggest component of graph
comps < components(pl)
pl < induced_subgraph(pl,comps$membership==1)
The graph pl
now contains only the unique players and two players are connected, if they ever played at the same club.
In order to compute the Zlatan Number, we need to compute the (geodesic) distances in the graph.
Very briefly, if A is connected to B and B to C, but A not to C, then A and C are at distance two of each other.
The distance can be directly translated into the Zlatan number. All players directly connected with him
have a Zlatan number of one. Players at distance two (thus Zlatan number of two) have played with someone, who played
with Zlatan, and so on.
zlatan < which(V(pl)$name=="Zlatan Ibrahimovic_031081")
zlatan_number < distances(pl,zlatan)
Let’s look at the distribution.
tibble(zlatan_no=c(unname(zlatan_number))) %>%
ggplot(aes(x=zlatan_no))+geom_bar()+
hrbrthemes::theme_ipsum_rc()+
labs(x="Zlatan Number",y="Count",title="Distribution of the Zlatan number")
Needless to say, there is only one player with a Zlatan number of zero. Noone is
like Zlatan!
328 players had the honor to play with Zlatan for the same team.
The maximum observed Zlatan number is 4 for 4624 players.
The vast majority (46791) of players have a Zlatan number of 3.
The mean Zlatan number is 2.84. So we might well call
it the Three Degrees of Zlatan.

Mathematician Daniel Kleitman↩

“Source of Material is http://www.footballsquads.com. Material: © FootballSquads.com, 1999 – 2018, All Rights Reserved”↩

I hope there are no to players with the same name AND birthday!↩
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.