ERGM Tutorial
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
This was originally for a talk to the UC Davis Network Science group on using statnet
to manage, visualize, and model networks with a focus on exponential random graph models (ERGM). I have cleaned it up a little so that it hopefully stands on its own. If anything is unclear, feel free to leave questions in the comments.
Introduction
Motivation: Why ERGM?
To predict ties

As function of individual covariates, e.g. Are girls more popular than boys?

As function of network structures, e.g. If Adam is friends with Bill, and Bill is friends with Carl, what can we say about the chances of Adam and Carl being friends?
To handle nonindependence of observations
Suppose we want to predict a set of dichotomous ties, such as which countries will be at war, and our predictors are actor attributes, such as type of government and difference in defense budgets.
What’s wrong with estimating a logistic regression from an n x k matrix like this?
regime 1  regime 2  budget diff  war? 

democracy  democracy  $5e7  0 
democracy  theocracy  $1e9  0 
democracy  dictatorship  $2e9  1 
Even if you’re not explicitly interested in how other ties affect the likelihood of a tie, network effects imply a correlation of residuals, so ignoring them will produce biased estimates. Here is a hopefully intuitive example: If the U.S. and U.K. are both at war with Iraq, that fact must be considered for an unbiased estimate of the odds of the U.S. and U.K. being at war with each other.
Cranmer and Desmarais (2011) used ERGM to challenge conventional internationalrelations wisdom on democracies going to war:
To disentangle and quantify exogenous and endogenous effects

Simultaneously model effect of network structures, actor attributes, and relational attributes

Get estimates and uncertainty for each effect
To simulate networks
ERGMs are generative: Given a set of sufficient statistics on network structures and covariates of interest, we can generate networks that are consistent with any set of parameters on those statistics.
ERGM Output

Much like a logit (see above table). Coefficients are the change in the (logodds) likelihood of a tie for a unit change in a predictor.

Predictors are networklevel statistics that represent Markovian processes, so we can think about their changes locally. E.g. in the above table, the estimate of 0.0005 on distance says that the logodds an edge decreases by 0.0005 for every unit of distance between the nodes. The network statistic that represents that effect is the sum of distances between nodes that have an edge between them.
Underlying machinery

For details from a physicist’s perspective see PierreAndre’s talk (pdf)

Define jointlikelihood of all ties and assume observed set is expectation

Choose set of (\(k\)) sufficient statistics (\(\Gamma\))
 Use MCMC to find parameter values (\(\theta\)) for statistics that maximize the likelihood of the set of observed ties (\(Y_m\))
 MCMC provides confidence intervals
 Maximize \(\theta\):
 From parameter estimates, can estimate the probability of any edge:
That not only allows us to calculate the probability of tie for a particular dyad, but but also to aggregate tieprobabilities arbitrarily by calculating probabilities for every dyad in the network and then summarizing the probabilities by whatever is of interest.
Using statnet
Network Data Management
Load statnet suite of packages, includes network, sna, ergm, and more:
Load a sample dataset. Vertices are monks in a monastery, and edges are selfreported liking between the monks. Typing the name of the network object provides some descriptive info.
What attributes do nodes in this network have?
Accessing a nodal attribute’s values:
For most attribute methods, you can mix and match levels: vertex
, edge
, and network
, and what you want to do: get
, set
, delete
, list
. E.g.
More info on functions to view and change attributes of networks, vertices, and edges: ?attribute.methods
.
A convenient shortcut: n %v% 'group'
is identical to get.vertex.attribute(n, 'group')
(and retrieves the values of the group attribute of nodes. %e%
and %n%
function the same way for edges and network, respectively. These shortcuts can be used for both assignment and retrieval.
Plotting
statnet
has pretty decent plotting facilities, at least on par with igraph
, I think. The full set of options is available at ?plot.network
. The following code demonstrates a few of my common specifications: put names on labels (by default, the “vertex.names” vertex attribute), size nodes to their indegree, color by group (another vertex attribute), and shape nodes by whether they were in cloisterville (4 sides are squares, 50 sides are basically circles). pad
protects the labels from getting clipped.
ERGM
Estimation and Interpretation
Estimate the simplest model, one with only a term for tie density (akin to an intercept term in a glm):
Because that is a dyadicindependent model (the likelihood of a tie doesn’t depend on any other), ergm solves the logistic regression instead of resorting to MCMC.
Note that the edges term represents exactly the density of the network (in logodds). That is, the probability of any tie (aka the density of the network) is the inverselogit of the coefficient on edges:
Now let’s make things more interesting and estimate a term for reciprocity of ties. That is, given an i > j tie, what is the change in logodds likelihood of a j > i tie? The coefficient estimate on mutual
tells us exactly that:
Whoa, something different happened there! MCMC happened. ergm
went off and did a bunch of simulations to find approximate MLE coefficients. Let’s interpret them. The baseline probability of a tie now is
But if the reciprocal tie is present, then the log odds of the tie is 2.32x greater, which we can translate into probability using the logistic function:
Much more likely: 64% chance, compared to the baseline of 15%.
Before we start writing up our submission to Science though, we need to check two things: 1) that the MCMC routine behaved well (that our estimates are likely good approximations of the MLEs), and 2) that our model fits the data well. statnet
has functions to do both those things.
Checking MCMC chains
We use the mcmc.diagnostics
function to get info on the MCMC chains, which by default are presented both graphically and through summary statistics. The statistics can be quite useful, but for simplicity here I’m going to silence them and focus on the trace plots of the chains.
We look for the same things as in any MCMC estimation: wellmixed, stationary chains. These look great – the chains thoroughly explore the parameter space and don’t wander over the course of the simulation. If your chains wander, you might A) have an illspecified model, and/or B) be able to improve things by increasing the length of the MCMC routine or changing other parameters, which you can control via the control
argument to ergm
. Here’s a (silly) example to show how to change the MCMC parameters and what bad chains look like:
See ?control.ergm
for the many customizable details of estimation.
Examining model fit
Now that we can trust our model estimates, let’s see if they make a good fit to the data. We use the gof
(goodnessoffit) function for this purpose. gof
simulates networks from the ERGM estimates and, for some set of network statistics, compares the distribution in the simulated networks to the observed values.
The current gof
implementation has two useful modalities, one checks goodnessoffit against the statistics included in the model (in aggregate), for which the text output is usually sufficient. Note that a pvalue closer to one is better: This is the difference between the observed networks and simulations from the model.
The other gof
modality checks goodnessoffit against some standard summary statistics – by default: degree, edgewise shared partners, and path length – decomposed to the components of the distributions. Plotting these is often quite informative. The black lines are the observed distributions and the boxplots reflect networks simulated from the model.
To change which statistics are included, specify them as a model formula to the GOF
argument to the gof
function. E.g. gof(m2, GOF = ~ triadcensus + odegree + idegree)
. The list of supported statistics is available in the help file for gof
.
Simulating networks
Another nice feature of ERGMs is that they are generative. Given a set of coefficient values, we can simulate networks that are near the maximum likelihood realization of sufficient statistics. This can be useful for examining fit, among other things, and is easy using the S3 method for simulate
for an ergm
object. In addition to checking model fit, you can change parameter values, constrain the network in various ways, etc. See ?simulate.ergm
for details.
Simulate four networks that are consistent with our model and plot them, as we plotted the observed network above:
Visualizing those simulated networks makes it clear we’re not getting the monks into their groups. We can accomplish that by including a term for group homophily. This kind of iterative estimation and checking of fit is part of the art of ERGMing (and should be for model fitting in general, I think).
For a complete list of readytouse statistics, see ?ergm.terms
. If you don’t see what you want, you can roll your own using the ergm.userterms
package. To do so requires a bit of C, but really isn’t too bad (assuming the statistic you want calculate is simple).
To add a term for homophily within Sampson’s groups we use the term nodematch
, which takes at least one argument (the nodal attribute), and provides the change in the likelihood of a tie if the two nodes match on that attribute. Note that you can estimate a differential homophily effect; that is, the change in tie likelihood for two nodes being in the same group can vary by group, by specifying the diff = TRUE
argument to nodematch
.
Before we estimate the model, a handy trick: Remember that ERGM works by calculating networklevel statistics. You can get the values of those statistics using the S3 method for summary
for an ERGM formula:
So of the 88 ties in the network, 28 of them are reciprocal, and 63 of them are between monks within a group. So we should expect a strong positive coefficient for the grouphomophily term. Let’s see:
Indeed. The logodds of a withingroup tie are 2x greater than an acrossgroup tie. We can exponentiate to find the change in the odds, exp(coef(m3)[3])
= 7.63. The change in the odds is true independent of the other attributes of the tie (e.g. whether or not it is reciprocal). The probability of a tie, however, is nonlinear: it depends on the value of other statistics, so to calculate a change in probability you must choose a value for every other statistic in the model, then you can use the inverselogit to find the difference in probability across your effect of interest. E.g. Let’s look at the probability of nonreciprocal ties within and acrossgroups:
Probability of a nonreciprocal, acrossgroup tie:
Probability of a nonreciprocal, withingroup tie:
Let’s take a look at the goodness of fit of that model:
We’re not capturing the variance in indegree – the most popular monks in our simulated networks are not as popular as in the data. Twostars can be used to represent popularity (because the more edges on a node, the more two stars an additional edge will create):
Let’s simulate some networks from that model and see what they look like:
Model comparison
Is the fit getting better? Looks like it, but hard to say, and there is danger of overfitting here as elsewhere. Can use formal model comparison as with other models:
Model degeneracy and geometericallyweighted terms
Model degeneracy is a major problem for ERGMs. Degeneracy refers to a case where the MLE for the specified sufficient statistics produce graphs that are either complete, or empty, or have all edges concentrated in a small region of the graph, or otherwise produce networks that are not of interest. Handcock’s 2003 Assessing Degeneracy in Statistical Models of Social Networks (pdf) is an excellent early treatment of the issue. It is a sign of an illspecified model, but unfortunately we often want estimates for theoretically justified reasons that we cannot get due to degeneracy issues. The quintessential such term is for triangles: How does the likelihood of a friendship change if two people already have a friend in common? For this small of a network we can estimate that directly:
In this case, having a shared friend makes a monk less likely to report liking another monk (\(\theta_{triangle} < 0\)), after the other effects are accounted for. That is very rare for networks with positivelyvalenced ties. It is also rare that nondegenerate estimates are possible for models with triangles, particularly for graphs larger than a few dozen nodes.
That may not be as much of a substantive problem as it seems. The implication of a triangles term is that the likelihood of tie changes proportionately to the number of shared friends two people have. That is, if having one shared friend makes a tie 25% more likely, having six shared friends makes a tie 150% more likely. Perhaps we should discount each additional tie. We can do that with the geometricallyweighted edgewise shared partners (gwesp) term. It takes a parameter, \(\alpha\) that controls how much to discount 2nd, 3rd, etc. shared partners. ergm
will estimate a value for \(\alpha\) by default, but this is generally not a good idea; instead fix it via the fixed
argument to gwesp
. The closer \(\alpha\) is to zero, the more dramatic the discounting applied to subsequent shared partners.
Similarly, geometrically weighted degree (gwdegree) estimates the change in tie likelihood given the degree of the nodes involved, but with marginally decreasing weighting as degree increases. Like triangles, twostars often produces degeneracy for all but the smallest graphs; gwdegree can circumvent this problem. It takes a parameter related to gwesp’s \(\alpha\), decay, which should also generally be fixed. The closer decay is to zero, the more gwdegree considers low degree nodes relative to high degree nodes. For undirected networks, the term is gwdegree
; for directed networks, in and outdegree are modeled separately via gwidegree
, and gwodegree
, respectively. These terms can be useful for modeling a popularity effect, but they are often, perhaps unfortunately, used simply to aid model convergence.
The way the gwdegree
terms are constructed, negative estimates reflect an increased likelihood on ties to higherdegree nodes. Appropriate use and interpretation of the gw terms is tricky, and I am working on a paper with more detailed guidance, which I will be presenting at PolNet 2016. For now, Hunter, 2007 is an excellent reference, and the statnet listserve is quite friendly and has a rich archive of discussions online.
I hope this has been helpful. Feel free to leave questions below.
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.