# Stress based graph layouts

September 12, 2018
By

(This article was first published on schochastics, and kindly contributed to R-bloggers)

I academically grew up among graph drawers, that is, computer scientists and mathematicians
interested in deriving two-dimensional depictions of graphs. One may despicably call
it pixel science, yet a lot of hard theoretical work is put into producing pretty graph
layouts. Although I am not at all an expert in this field, I have learned a thing or two
about that subject. As such, I have always been surprised why one of the (potentially) best
algorithms is not implemented in R. This post is about my humble try to change this.

If you read this and say: Hey! there is already a package for that! please do let me know.

#used libraries
library(tidyverse) # for data wrangling
library(igraph)    # for network data structures and tools
library(ggraph)    # for prettier network visualizations
library(igraphdata)    # some network data
library(patchwork) # combine ggplot objects 

# Graph layouts in igraph

The R package igraph comes with a lot of inbuilt layout algorithms. Just type layout_
in Rstudio and you will get overwhelmed by the possibilities. As a minor side note:
If you ever struggle with anything in igraph, consult the excellent tutorial from Katherine Ognyanova.

I usually have mixed feelings about using R to draw my networks and mostly resort to dedicated
software such as visone. Mostly, because I feel that the
algorithms in igraph tend to not be nice, even with the layout_nicely() function.

Consider a typical benchmark graph for graph drawing, which can be downloaded here.

el <- read_delim("power-1138-bus.mtx",delim=" ",col_names = F)
g <- graph_from_data_frame(el,directed=F)
g <- igraph::simplify(g)

Let’s see what igraph thinks a nice layout looks like.

par(mar=c(0,0,0,0))
plot(g,layout=layout_nicely,vertex.size=0.5,vertex.label=NA)

I know, “beauty lies in the eyes of the beholder”, but I personally do not think that this is particularly
nice. Below, you see a collection of layouts, produced by different algorithms.

par(mfrow=c(2,2),mar=c(0,0,0,0))

plot(g,layout=layout_with_drl,vertex.size=0.5,vertex.label=NA)
plot(g,layout=layout_with_lgl,vertex.size=0.5,vertex.label=NA)
plot(g,layout=layout_with_fr,vertex.size=0.5,vertex.label=NA)
plot(g,layout=layout_with_mds,vertex.size=0.5,vertex.label=NA)

Notice the big differences. Personally, I would prefer the layout_with_lgl (top right).
Below is a bigger version drawn with ggraph.

ggraph(g,layout="lgl")+
geom_node_point(col="black",size=0.3)+
theme_graph()

You will notice that this layout looks different than above.
This is due to the fact, that the algorithm underlying layout_with_lgl is non-deterministic, meaning
that it produces different pictures in consecutive runs. In fact, most of the other
layout algorithm have this (annoying?) feature. More than once I have found myself
layouting the network over and over again until I was satisfied.

# Stress majorization

The first thing I learned from my graph drawing peers was to minimize stress. Not necessarily
in the sense of work (which doesn’t work anyway while being a PhD student), but for
graph layouting. Stress majorization is actually an optimization strategy used in multidimensional scaling where the goal is to minimize the so-called stress function defined as
$\sigma(X)=\sum_{i$

where $$w_{ij} \geq 0$$ is a weight between a pair of points $$(i,j)$$ , $$d_{ij}$$ is
the geodesic distance between $$i$$ and $$j$$ and $$\delta _{ij}$$ is the euclidean distance
of coordinates $$X_i$$ and $$X_j$$. By minimizing stress, we thus seek to find cartesian coordinates
for each node so that the euclidean distance is as close as possible to the geodesic distance.
If you are interested in more technical details, please see the original contribution by
Gansner et al..

# Implementation with Rcpp and the smglr package

I implemented stress majorization with Rcpp. While the code is not that involved, it still is a bit lengthy.
I created a very rudimentary R package containing the stress majorization graph layout algorithm, which
is available via github.

# devtools::install_github("schochastics/smglr")
library(smglr)

So what does our benchmark network look like using stress majorization?

l <- stress_majorization(g)

ggraph(g,layout="manual",node.positions=data.frame(x=l[,1],y=l[,2]))+
geom_node_point(col="black",size=0.3)+
theme_graph()

In my opinion, this looks definitely better than any of the layouts before.

# More examples

Here are two more examples to convince you of stress based layouts (always the right one).

# preferential attachment
pa <- sample_pa(1000,1,1,directed = F)

ggraph(pa)+
geom_node_point(col="black",size=0.3)+
theme_graph() -> p1

l <- stress_majorization(pa)
ggraph(pa,layout="manual",node.positions=data.frame(x=l[,1],y=l[,2]))+
geom_node_point(col="black",size=0.3)+
theme_graph()-> p2

p1+p2

# yeast protein interactions from igraphdata (only biggest component)
data(yeast)
comps <- components(yeast)
bcomp <- which.max(comps$csize) yeast <- induced_subgraph(yeast,comps$membership==bcomp)

ggraph(yeast)+
geom_node_point(col="black",size=0.3)+
theme_graph() -> p1

l <- stress_majorization(yeast)
ggraph(yeast,layout="manual",node.positions=data.frame(x=l[,1],y=l[,2]))+
geom_node_point(col="black",size=0.3)+
theme_graph()-> p2

p1+p2

# Caveats

Stress majorization produces nice layouts, is deterministic and easy to implement.
The downside is, that it is rather slow for large networks (I also partially blame my
implementation for that). But there is also a way out of that problem. Former colleagues of
mine published a sparse stress model
which allows stress based layouting for really large graphs. The java code can be found on
github. Also, keep an eye out for
an R package called visone3 which will, among other things, also allow for stress based layouts.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...