Exponential random graph models with R

[This article was first published on R/Notes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This note documents the a small but growing microverse of R packages on CRAN to produce various forms of exponential random graph models (ERGMs), which are a kind of modelling strategy akin to logistic regression for dyadic data.

The starting point: ergm

The gravitational centre of the ERGM microverse is the ergm package, by Handcock et al. The package is part of the statnet suite of software packages, and is well documented through articles primarily published in Social Networks (for the theoretical explanation of how ERGMs operate) and in the Journal of Statistical Software (for the R implementation of the models).

Cosma Shalizi has compiled a nicely organised list of references on ERGMs, which includes the JSS special issue that introduced me to the topic. As Shalizi notes, another very recommended reading on the topic is the classic “Birds of a Feather” paper published in Demography, which introduces ERGMs through an excellent empirical example that clearly explains how homophily works.

As far as ERGM-related blog posts go, the best read I have stumbled upon so far is Alex Hanna’s “Lessons on exponential random graph modeling”, which is based on an accessible and fun example. However, blogs are not the best knowledge source on ERGMs: to get precise answers to precise modelling questions, users should turn to the statnet mailing-list.

Generalizing the models

The “dependent variable” of an ERGM is a binary network. When the network is weighted, the modelling strategy needs to be amended in several ways, since both the distribution of reference and the model terms are going to change to take edge values into account.

There are currently two packages to generalize ERGMs: the ergm.count package, by Pavel N. Krivitsky and others, which is well documented in a Sunbelt tutorial, and the very recent GERGM package, by Matthew D. Denny and others, which is well illustrated in the README of its GitHub repository.

Extending the model terms

The ergm package implements an amazingly long list of ERGM terms, but that is not the end of the story: additional terms can be added through the ergm.userterms package, such as these two terms to model local cyclic and transitive triples, or these four terms to model graphlets.

The possibility to extend ERGM terms calls for an additional remark: the term/parameter space of ERGMs is absolutely huge and is quickly expanding. If you are used to building regression models from a limited number of variables and a few sensible interactions between them, get ready for a totally different modelling experience.

In particular, some of the terms that control for structural effects in ERGMs are highly sensitive to their internal parameters, such as the decay parameter of geometrically-weighted distribution terms. As a consequence, it takes ages to calibrate an ERGM, often by testing all possible values of the parameters over small increments. The literature contains many illustrations of that strategy (here’s one; see the footnote at page 16).

Extending the Bayesian logic

Another related remark about ERGMs is that their average estimation time has nothing in common with the average estimation time of (lowly hierarchical) regression models. If the network(s) that you want to feed to your model(s) contain(s) many nodes, think “days” instead of “seconds” when planning execution time.

The explanation is that ERGMs rely on MCMC estimation, which can take very long to converge, without the user being able to determine in advance exactly how long. This feature of ERGMs severely constrains their computational tractability, with runtimes of several hours or even days when estimating ERGMs on networks with, say, 200+ nodes.

For a complete Bayesian framework to use with ERGMs, users can turn to the Bergm package, which Alberto Caimo and Nial Friel have carefully documented in three separate papers. Their work adds sampling from the posterior distribution (and much more) to the ERGM logic, in order to turn it into a fully Bayesian modelling strategy.

Extending the model strategies

There are many more ways to extend ERGMs through R packages:

I intend to better document these last strategies, as well as the other ones presented in this note, as soon as I find the time to learn more about them. For now, I will close this note by citing a forthcoming review article that will undoubtedly mention ERGMs, “Navigating the Range of Statistical Tools for Inferential Network Analysis”, by Skyler Cranmer and others, which is to be published in the American Journal of Political Science.

To leave a comment for the author, please follow the link and comment on their blog: R/Notes.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)