**Small World**, and kindly contributed to R-bloggers)

## Random Walks

Let’s consider the graph below. And let’s assume that we drop a piece of information on the red vertex. Then we’d like to know a couple of things. For example, where does it spread first? How far does it spread? Will the vertices, close to the red vertex obtain more information then those far away? And finally, will the information continue to go back and forth forever or will we reach a stable distribution eventually?

**A**and multiply it with the vector containing the initial distribution of random walkers

**x**:

**x’ = Ax**. That’s for one hop – the second hop can be computed with

**x” = Ax’**and so on. Below, you see the distribution of random walkers after different lengths of random walks.

## Lazy Random Walks

The idea of lazy random walks is that we allow the random walkers to remain on a vertex with probability 1/2. Hence, our formula becomes **x’ = 1/2(A + I)*x**. In this formula, **I** is the identity matrix and **A** is the original matrix of transition probabilities. In the animation below, the thickness of an edge corresponds to the geometric mean of the amount of information of its adjacent vertices. The color of the edge is determined by the “information current”: The difference in the amount of information between the adjacent vertices. In other words: The thicker the edge, the more information flows along the edge. Edges with an adjacent vertex that has a high degree tend to be colored red. These are the vertices which contain a significantly larger part of information than the rest of the network.

As we can see, the distribution converges. And not only that: It also becomes less and less apparent where the information was originating from. In fact, for the vertices which are well connected, the distribution of information (or random walkers – whatever you want to call it) approximates their degree distribution. This means that high-degree vertices will contain proportionally more information than low-degree vertices.

But what if we actually wanted the vertex that contains the information initially to play an important role? One way to model that is to not stall on any vertex, but let the random walkers jump back to this specific vertex with a given probability (i.e. the teleport probability, alpha).

## Personalized PageRank

It turns out that this is exactly what “Personalized PageRank” is all about. It models the distribution of rank, given that the distance random walkers (the paper calls them random surfers) can travel from their source (the source is often referred to as “seed”) is determined by alpha. In fact the expected walk-length is 1/alpha. The formula now becomes **x’ = (1-alpha)*Ax + alpha*E**. Here, **alpha** is a constant between 0 and 1 and **E** is the vector containing the source of information – i.e. in our case it is all zero, except for the red vertex where our information starts to spread.

In this animation, alpha is fixed to 1/2 in order to being able to allow for comparison with the lazy random walks. This is pretty high, though – with the result that our information indeed remains close to the seed vertex. In many cases we will want the random walkers to travel farther. Below, is an animation for alpha = 0.1

## Conclusion

**leave a comment**for the author, please follow the link and comment on their blog:

**Small World**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...