Preferential attachment for network

September 15, 2012

(This article was first published on StaTEAstics., and kindly contributed to R-bloggers)

I am currently taking the networked life course on offered by Professor Michael Kearns from the University of Pennsylvania.  I have been took several courses including machine learning, natural language processing since the platform was launched late last year. I have to give my biggest praise to Andrew Ng and his team for creating such platform and enable knowledge sharing to those who have no access to it.

Nevertheless, I want to talk about and demonstrate a type of simulated network model called preferential attachment introduced in the course.

The idea of this particular network is that individuals are more likely to be connected to others who already has a large connection (degree), this is typical for social network and many real life situation. The more connection one has, the more exposed the individual is to the rest of the world.

This model is able to mimic the small diameter, high cluster and heavy-tailed degree distribution of typical large networks.

For brevity, we will just focus on the heavy-tail property and power-law decay of the degree distribution of the network. Where a large number of individuals has small number of connection while a small number of individuals have a much much greater number of connections.

 The following simulation simulates a preferential attachment network and plot the histogram and the log-log relationship.  The simulation is based on the same example that N number of individuals chooses from 1000 night clubs.

  N = 1000
  for(i in 1:100){
    m = 100 * i
    rlt = rep(1, N)
    for(i in 1:m){
      selection = sample(x = 1:N, size = 1, prob = rlt)
      rlt[selection] = rlt[selection] + 1
    rlt = rlt – 1
        par(mfrow = c(1, 2))
    bin = hist(rlt, breaks =  m/10,
      main = paste(“Histogram of preferential network\n with “, m,
        ” individuals choosing\nbetween 1000 clubs”, sep = “”),
      xlab = “Number of Individuals”, ylab = “Frequency”)
    plot(log(bin$breaks[-1]), log(bin$counts),
         main = “log-log histogram”, xlab = “”, ylab = “”)
    lm.df = data.frame(lbin = log(bin$breaks[-1]), lcounts = log(bin$counts))
    lm.df = lm.df[lm.df$lcounts != -Inf & lm.df$lcounts != 0 , ]
    fit = lm(lcounts ~lbin, data = lm.df)
    abline(coef = coef(fit))

From the simulation, we can see the histogram becomes more and more heavy-tailed as we increase the number of individuals/vertex in the network, but I remain sceptical about the linear relationship in the log-log plot which is an indicator of the power law decay.

To leave a comment for the author, please follow the link and comment on their blog: StaTEAstics.. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)