Six Degrees of Zlatan Ibrahimovic

[This article was first published on schochastics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post is based on the Six Degrees of Kevin Bacon which itself is an adoption of the Erdős number in math. Readers familiar with the concepts can skip the following paragraph and go directly to the calculation of the Zlatan number.

I have done this before on my old blog, but I felt like redoing the analysis.

What is an Erdős or Bacon number?

Paul Erdős wrote an incredible amount of scientific papers (around 1500) with around 500 collaborators. Due to his prolific output, the Erdős Number was created. An Erdős number describes a person’s degree of separation from Erdős himself, based on their collaboration with him, or with collaborators of Erdős. Erdős is the only one with an Erdős number of zero, while his immediate collaborators have an Erdős number of one, their collaborators have Erdős number at most two.

The concept of the Bacon number is basically the same. Similar to Erdős, Kevin Bacon has quite a prolific screen career such that many actors can be linked by co-occurrences to Kevin Bacon. Kevin Bacon is the only one with a Bacon number of 0. Actors occurring in a movie with Bacon have a Bacon number of 1 and so on.

The numbers can also be aggregated to form the Erdős-Bacon number. To obtain an Erdős-Bacon number, you either have to be an acting scientist or an actor who wrote a scientific paper. According to Wikipedia the smallest existing sum of both numbers is three1.

The Erdős-Bacon-Sabbath number seems to be the craziest extension of this number game. Natalie Portman, for example, has an Erdős-Bacon-Sabbath number of 11.

The Zlatan Number

Zlatan’s career stats are impressive. He has played for most top tier clubs in Europe. As such, who else than him would qualify as the Erdös or Bacon of football?

I scraped all available squads from top tier clubs since 2001 from footballsquads2. If you are interested in the R code, I will gladly provide it.

The data we are working with looks as follows.

## Observations: 208,774
## Variables: 8
## $ unique_player <chr> "Martin Bernachia_10-03-77", "Ruben Cordoba_06-0...
## $ name          <chr> "Martin Bernachia", "Ruben Cordoba", "Silvio Dua...
## $ pos           <chr> "G", "D", "D", "D", "M", "D", "M", "M", "F", "M"...
## $ date_of_birth <chr> "10-03-77", "06-05-72", "08-06-78", "12-09-75", ...
## $ unique_club   <chr> "almagro_2004-2005_arg", "almagro_2004-2005_arg"...
## $ club          <chr> "almagro", "almagro", "almagro", "almagro", "alm...
## $ season        <chr> "2004-2005", "2004-2005", "2004-2005", "2004-200...
## $ country       <chr> "arg", "arg", "arg", "arg", "arg", "arg", "arg",...

The columns unique_player and unique_club were created to separate players with the same name3 and distinguish between squads of clubs for different seasons.

From this data frame, we create a bipartite graph. The set of nodes contains all unique players and all unique clubs. A player and a club are linked, if the player was in the squad in the given season.

nodes <- tibble(name=c(df$unique_player,df$unique_club),
                type=rep(c(T,F),each=nrow(df))) %>% distinct() 

#bipartite network
g <- graph_from_data_frame(select(df,unique_player,unique_club),
                           directed=F,vertices=nodes)

#bipartite projection on players
pl <- bipartite_projection(g,which="true",multiplicity = F)

#get biggest component of graph
comps <- components(pl)
pl <- induced_subgraph(pl,comps$membership==1)

The graph pl now contains only the unique players and two players are connected, if they ever played at the same club. In order to compute the Zlatan Number, we need to compute the (geodesic) distances in the graph. Very briefly, if A is connected to B and B to C, but A not to C, then A and C are at distance two of each other. The distance can be directly translated into the Zlatan number. All players directly connected with him have a Zlatan number of one. Players at distance two (thus Zlatan number of two) have played with someone, who played with Zlatan, and so on.

zlatan <- which(V(pl)$name=="Zlatan Ibrahimovic_03-10-81")
zlatan_number <- distances(pl,zlatan)

Let’s look at the distribution.

tibble(zlatan_no=c(unname(zlatan_number))) %>%
  ggplot(aes(x=zlatan_no))+geom_bar()+
  hrbrthemes::theme_ipsum_rc()+
  labs(x="Zlatan Number",y="Count",title="Distribution of the Zlatan number")

Needless to say, there is only one player with a Zlatan number of zero. Noone is like Zlatan!

328 players had the honor to play with Zlatan for the same team. The maximum observed Zlatan number is 4 for 4624 players. The vast majority (46791) of players have a Zlatan number of 3. The mean Zlatan number is 2.84. So we might well call it the Three Degrees of Zlatan.


  1. Mathematician Daniel Kleitman

  2. “Source of Material is http://www.footballsquads.com. Material: © FootballSquads.com, 1999 - 2018, All Rights Reserved”

  3. I hope there are no to players with the same name AND birthday!

To leave a comment for the author, please follow the link and comment on their blog: schochastics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)