Six Degrees of Zlatan Ibrahimovic

September 27, 2018
By

(This article was first published on schochastics, and kindly contributed to R-bloggers)

This post is based on the Six Degrees of Kevin Bacon which itself is an adoption of the Erdős number in math. Readers familiar with the concepts can skip the following paragraph and go directly to the calculation of the Zlatan number.

I have done this before on my old blog,
but I felt like redoing the analysis.

What is an Erdős or Bacon number?

Paul Erdős wrote an incredible amount of scientific papers (around 1500) with around 500 collaborators.
Due to his prolific output, the Erdős Number was created. An Erdős number describes a
person’s degree of separation from Erdős himself, based on their collaboration with him,
or with collaborators of Erdős. Erdős is the only one with an Erdős number of zero, while
his immediate collaborators have an Erdős number of one, their collaborators have Erdős number at most two.

The concept of the Bacon number is basically the same. Similar to Erdős, Kevin Bacon has
quite a prolific screen career such that many actors can be linked by co-occurrences
to Kevin Bacon. Kevin Bacon is the only one with a Bacon number of 0. Actors
occurring in a movie with Bacon have a Bacon number of 1 and so on.

The numbers can also be aggregated to form the Erdős-Bacon number. To obtain an Erdős-Bacon number, you either have to be an acting scientist or an actor who wrote a scientific paper. According to Wikipedia the smallest existing sum of both numbers is three1.

The Erdős-Bacon-Sabbath number seems to be the
craziest extension of this number game.
Natalie Portman, for example, has an Erdős-Bacon-Sabbath number of 11.

The Zlatan Number

Zlatan’s career stats are impressive. He has played for most top tier clubs in Europe.
As such, who else than him would qualify as the Erdös or Bacon of football?

I scraped all available squads from top tier clubs since 2001 from footballsquads2. If you are interested in the R code,
I will gladly provide it.

The data we are working with looks as follows.

## Observations: 208,774
## Variables: 8
## $ unique_player  "Martin Bernachia_10-03-77", "Ruben Cordoba_06-0...
## $ name           "Martin Bernachia", "Ruben Cordoba", "Silvio Dua...
## $ pos            "G", "D", "D", "D", "M", "D", "M", "M", "F", "M"...
## $ date_of_birth  "10-03-77", "06-05-72", "08-06-78", "12-09-75", ...
## $ unique_club    "almagro_2004-2005_arg", "almagro_2004-2005_arg"...
## $ club           "almagro", "almagro", "almagro", "almagro", "alm...
## $ season         "2004-2005", "2004-2005", "2004-2005", "2004-200...
## $ country        "arg", "arg", "arg", "arg", "arg", "arg", "arg",...

The columns unique_player and unique_club were created to separate players with the same name3 and distinguish between squads of clubs for different seasons.

From this data frame, we create a bipartite graph. The set of nodes contains all unique players and
all unique clubs. A player and a club are linked, if the player was in the squad in the given season.

nodes <- tibble(name=c(df$unique_player,df$unique_club),
                type=rep(c(T,F),each=nrow(df))) %>% distinct() 

#bipartite network
g <- graph_from_data_frame(select(df,unique_player,unique_club),
                           directed=F,vertices=nodes)

#bipartite projection on players
pl <- bipartite_projection(g,which="true",multiplicity = F)

#get biggest component of graph
comps <- components(pl)
pl <- induced_subgraph(pl,comps$membership==1)

The graph pl now contains only the unique players and two players are connected, if they ever played at the same club.
In order to compute the Zlatan Number, we need to compute the (geodesic) distances in the graph.
Very briefly, if A is connected to B and B to C, but A not to C, then A and C are at distance two of each other.
The distance can be directly translated into the Zlatan number. All players directly connected with him
have a Zlatan number of one. Players at distance two (thus Zlatan number of two) have played with someone, who played
with Zlatan, and so on.

zlatan <- which(V(pl)$name=="Zlatan Ibrahimovic_03-10-81")
zlatan_number <- distances(pl,zlatan)

Let’s look at the distribution.

tibble(zlatan_no=c(unname(zlatan_number))) %>%
  ggplot(aes(x=zlatan_no))+geom_bar()+
  hrbrthemes::theme_ipsum_rc()+
  labs(x="Zlatan Number",y="Count",title="Distribution of the Zlatan number")

Needless to say, there is only one player with a Zlatan number of zero. Noone is
like Zlatan!

328 players had the honor to play with Zlatan for the same team.
The maximum observed Zlatan number is 4 for 4624 players.
The vast majority (46791) of players have a Zlatan number of 3.
The mean Zlatan number is 2.84. So we might well call
it the Three Degrees of Zlatan.


  1. Mathematician Daniel Kleitman

  2. “Source of Material is http://www.footballsquads.com. Material: © FootballSquads.com, 1999 – 2018, All Rights Reserved”

  3. I hope there are no to players with the same name AND birthday!

To leave a comment for the author, please follow the link and comment on their blog: schochastics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)