# Six Degrees of Zlatan Ibrahimovic

**schochastics**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post is based on the Six Degrees of Kevin Bacon which itself is an adoption of the Erdős number in math. Readers familiar with the concepts can skip the following paragraph and go directly to the calculation of the Zlatan number.

*I have done this before on my old blog, but I felt like redoing the analysis.*

# What is an Erdős or Bacon number?

Paul Erdős wrote an incredible amount of scientific papers (around 1500) with around 500 collaborators.

Due to his prolific output, the Erdős Number was created. An Erdős number describes a

person’s degree of separation from Erdős himself, based on their collaboration with him,

or with collaborators of Erdős. Erdős is the only one with an Erdős number of zero, while

his immediate collaborators have an Erdős number of one, their collaborators have Erdős number at most two.

The concept of the Bacon number is basically the same. Similar to Erdős, Kevin Bacon has

quite a prolific screen career such that many actors can be linked by co-occurrences

to Kevin Bacon. Kevin Bacon is the only one with a Bacon number of 0. Actors

occurring in a movie with Bacon have a Bacon number of 1 and so on.

The numbers can also be aggregated to form the Erdős-Bacon number. To obtain an Erdős-Bacon number, you either have to be an acting scientist or an actor who wrote a scientific paper. According to Wikipedia the smallest existing sum of both numbers is three^{1}.

The Erdős-Bacon-Sabbath number seems to be the

craziest extension of this number game.

Natalie Portman, for example, has an Erdős-Bacon-Sabbath number of 11.

# The Zlatan Number

Zlatan’s career stats are impressive. He has played for most top tier clubs in Europe.

As such, who else than him would qualify as the Erdös or Bacon of football?

I scraped all available squads from top tier clubs since 2001 from footballsquads^{2}. If you are interested in the R code,

I will gladly provide it.

The data we are working with looks as follows.

```
## Observations: 208,774
## Variables: 8
## $ unique_player
``` "Martin Bernachia_10-03-77", "Ruben Cordoba_06-0...
## $ name "Martin Bernachia", "Ruben Cordoba", "Silvio Dua...
## $ pos "G", "D", "D", "D", "M", "D", "M", "M", "F", "M"...
## $ date_of_birth "10-03-77", "06-05-72", "08-06-78", "12-09-75", ...
## $ unique_club "almagro_2004-2005_arg", "almagro_2004-2005_arg"...
## $ club "almagro", "almagro", "almagro", "almagro", "alm...
## $ season "2004-2005", "2004-2005", "2004-2005", "2004-200...
## $ country "arg", "arg", "arg", "arg", "arg", "arg", "arg",...

The columns `unique_player`

and `unique_club`

were created to separate players with the same name^{3} and distinguish between squads of clubs for different seasons.

From this data frame, we create a bipartite graph. The set of nodes contains all unique players and

all unique clubs. A player and a club are linked, if the player was in the squad in the given season.

```
nodes <- tibble(name=c(df$unique_player,df$unique_club),
type=rep(c(T,F),each=nrow(df))) %>% distinct()
#bipartite network
g <- graph_from_data_frame(select(df,unique_player,unique_club),
directed=F,vertices=nodes)
#bipartite projection on players
pl <- bipartite_projection(g,which="true",multiplicity = F)
#get biggest component of graph
comps <- components(pl)
pl <- induced_subgraph(pl,comps$membership==1)
```

The graph `pl`

now contains only the unique players and two players are connected, if they ever played at the same club.

In order to compute the Zlatan Number, we need to compute the (geodesic) distances in the graph.

Very briefly, if A is connected to B and B to C, but A not to C, then A and C are at distance two of each other.

The distance can be directly translated into the Zlatan number. All players directly connected with him

have a Zlatan number of one. Players at distance two (thus Zlatan number of two) have played with someone, who played

with Zlatan, and so on.

```
zlatan <- which(V(pl)$name=="Zlatan Ibrahimovic_03-10-81")
zlatan_number <- distances(pl,zlatan)
```

Let’s look at the distribution.

```
tibble(zlatan_no=c(unname(zlatan_number))) %>%
ggplot(aes(x=zlatan_no))+geom_bar()+
hrbrthemes::theme_ipsum_rc()+
labs(x="Zlatan Number",y="Count",title="Distribution of the Zlatan number")
```

Needless to say, there is only one player with a Zlatan number of zero. Noone is

like Zlatan!

328 players had the honor to play with Zlatan for the same team.

The maximum observed Zlatan number is 4 for 4624 players.

The vast majority (46791) of players have a Zlatan number of 3.

The mean Zlatan number is 2.84. So we might well call

it the *Three Degrees of Zlatan*.

Mathematician Daniel Kleitman↩

“Source of Material is http://www.footballsquads.com. Material: © FootballSquads.com, 1999 – 2018, All Rights Reserved”↩

I hope there are no to players with the same name AND birthday!↩

**leave a comment**for the author, please follow the link and comment on their blog:

**schochastics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.