Making a hive plot of HIV transmission network — tutorial

[This article was first published on Pirate Science » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Hiveplots are the über-cool graph visualization created by the prolific Martin Krzywinski and ported to R by the equally talented Bryan Hanson. Martin has created a network visualization tool which displays the underlying structure of a network and ends the hairball syndrome.

Enough hype. Let’s get started.

The fundamental step is to think about what characterizes your network. Each node will lie on an axis and have a distance from the center point. If you can’t find a meaningful axis and distance for your data, don’t hiveplot it.

I am looking at patterns of HIV transmission. The data comes from the RESINA project, which combines patient demographics and HIV genetic sequences. Assuming that infections which are genetically very close reflect a transmission link (not necessarily direct- the transmission could have gone through a person who is not in the data. Also, we cannot detect direction), one can estimate an underlying transmission network. Our previous investigation suggested one transmission cluster contained 429 members. Let’s hiveplot it.

One question of interest is links between different transmission groups. This suggests the axis: men having sex with men (MSM), heterosexual contact (HET), intravenous drug use (IDU), and unknown. We are also interested in transmission times, which provides our distance. For convenience, we use the sequencing date, though we note that a phylogenetic estimate would be more accurate (but that is another tutorial).

Now the hard part is over, and the coding begins. The data currently exists as an igraph graph object. You can grab it from Figshare: hivnet.Rdata. Making a hive plot boils down to translating (igraph) node and edge attributes into a hive plot data object. This is a list with 5 named elements: two data frames (nodes, edges) describing the graph and three helper variables (2 or 3-d plot, axis colors, and a description/comment line).

The visible edges in the full transmission network

The first plot of the graph was confusing in that within group and between group transmissions were obscured by the total number of edges. I decided to make two side by side plots, one for homogeneous and one for heterogeneous transmission. This required classifying the edges. I also created two separate edge data frames, which was conceptually easier than masking out subsets.

This brings out another issue. Hiveplots only show edges between neighboring axis. This is intentional, it prevents cluttering the hive (the whole point is to avoid the hairball, right?).

There’s no easy way out in 2D except to confess and move on. What’s missing? The 42 IDU–HET edges. Probably not a big deal. The 448 Unknown–MSM edges are a bigger deal. They are 40% of the heterogeneous edges in the graph. If we actually knew the transmission route in the “unknown transmission route” group, I would change the layout of the 2D graph to show these edges.

A screen capture of the 3d plot rotated to show the Unknown–MSM edges (purple curves).

But hiveplots offer all kinds of 3D goodness, via the rgl library. A quick google will reveal numerous tutorials for making rgl movies, opening more possibilities than can be explored in one post. Suffice it to say that I made a 3d plot, rotated it until it showed the Unknown–MSM edges, and saved it as a png.



Here is the code for making the plots.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
require(igraph)        # the HIV network is stored as an igraph object
require(HiveR)         # Hive plots
require(RColorBrewer)  # to make color pallets
require(grid)          # multi-panel plotting
require(rgl)           # 3d plotting
 
load("hivnet.Rdata")
tcodes <- c('Unknown','IDU','MSM','HET')  ## transmission groups
 
## Define color palette
ecolor <- brewer.pal(4,'Accent')
 
## Utility function which assembles
## components into a HPD object
mkHPD <- function(node.df,edge.df,axis.cols,
                  type='2D',desc=''){
  res <- list(nodes=node.df,edges=edge.df,type=type,
              desc=desc,axis.cols=axis.cols)
  class(res) <- "HivePlotData"
  chkHPD(res)
  res
}
 
## make the node dataframe
ndf <- data.frame(id=1:vcount(hivgr), lab=V(hivgr)$name,
                  axis=as.integer(V(hivgr)$tgroup), radius=V(hivgr)$distance,
                  size=1, color=ecolor[V(hivgr)$tgroup],stringsAsFactors=FALSE)
 
## determine which edges link nodes of the same transmission type.
edgelist <-get.edgelist(hivgr,names=FALSE)
etrans <- cbind(V(hivgr)$tgroup[edgelist[,1]],  ## test if edges are to same
                V(hivgr)$tgroup[edgelist[,2]])  ## or different trans. group
ematch <- etrans[,1]==etrans[,2]
 
## use this to define some edge colors
## for homogenous transmission:
enodetype <- V(hivgr)$tgroup[edgelist[,1]]
## heterogeneous transmission types-- written over three lines for legibility
transtype <- apply(etrans,1,function(x){paste(sort(x),collapse="")})
transtype <- transtype[!ematch]
transtype <- as.integer(factor(transtype))
transcolors <- brewer.pal(max(transtype),'Accent')[c(5,2,1,4,6,3)] ## I prefer this color order
 
 
## homogeneous edges -- connect same trans group
edfhom <- data.frame(id1=as.integer(edgelist[ematch,1]),
                     id2=as.integer(edgelist[ematch,2]),
                     weight=1,color=as.character(ecolor[1]),
                     stringsAsFactors=FALSE)
## heterogeneous edges -- connect different trans group
edfhet <- data.frame(id1=as.integer(edgelist[!ematch,1]),
                     id2=as.integer(edgelist[!ematch,2]),
                     weight=1,color=as.character(transcolors[transtype]),
                     stringsAsFactors=FALSE)
 
## plot it
vplayout <- function(x,y){viewport(layout.pos.row=x,layout.pos.col=y)}
 
graphics.off()
grid.newpage()
pushViewport(viewport(layout=grid.layout(1,2)))
pushViewport(vplayout(1,1))
plotHive(mkHPD(ndf,edfhom,axis.cols=brewer.pal(4,'Set3')),
               axLabs=tcodes,axLab.pos=c(400,700,400,700),np=FALSE)
popViewport(2)
pushViewport(vplayout(1,2))
plotHive(mkHPD(ndf,edfhet,axis.cols=brewer.pal(4,'Set3')),
               axLabs=tcodes,axLab.pos=c(400,700,400,700),np=FALSE)
popViewport(2)
 
plot3dHive(mkHPD(ndf,edfhet,axis.cols=ecolor,type="3D"),
          axLab.gpar=gpar(col="white",fontsize=30),
          axLabs=tcodes,axLab.pos=c(250,400),np=FALSE)
 
## rotate at will, then
## rgl.snapshot("hiv_unk_msm.png")

The figures reveal some interesting patterns in the data. Caveat: given that the data is a somewhat randomized version of only one putative transmission cluster in one HIV patient sample, the observations may not generalize to other populations. Also remember that 40% connections are between Unknown and MSM. This also makes it difficult to draw firm conclusions.

The first thing to notice is that transmission seem to leapfrog. In both plots, on all axes, the transmission is not to the next node forward in time, but several steps forward. The IDUs and HETs are also interesting in that both groups rarely transmit among themselves. This network appears to be driven by the MSM community.

Here, the axes accurately reflect the proportion of each transmission group.

The original plot also hides a few facts. Since all branches are of equal length, the relative proportions are not at all obvious. We can easily change this, however, by altering the distance parameter in the node data frame. Instead spacing nodes according to date, we give them unit spacing. (ordered by date, to provide some continuity to the first plots). This is done by changing the radius in the node data frame to the rank of the distance:

1
2
3
4
5
6
7
8
9
for(i in 1:4){  ## I know for loops are not very R-like, but neither is
  V(hivgr)$rank[V(hivgr)$tgroup==i] <-    ## an assignment within a sapply
    rank(V(hivgr)$distance[V(hivgr)$tgroup==i])}
 
 
ndf <- data.frame(id=1:vcount(hivgr), lab=V(hivgr)$name,
                  axis=as.integer(V(hivgr)$tgroup),
                  radius=V(hivgr)$rank,
                  size=1, color=ecolor[V(hivgr)$tgroup],stringsAsFactors=FALSE)

and otherwise proceeding as above.

If your plots don't look quite like mine, it is because I plot to pngs, not the screen. Adjusting the size of the png changes the display size of the graph components. Try wrapping the plot calls in

1
2
3
png('HIVhiveplot_full.png',width=3000,height=1500)
## PLOT HERE
dev.off()


Many thanks to Bryan, who provided substantial and immediate email support to this project.

To leave a comment for the author, please follow the link and comment on their blog: Pirate Science » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)