Site icon R-bloggers

Network visualization – part 5: Cytoscape (an update) – RCy3

[This article was first published on Fun with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few years ago I wrote a series of blogs about network visualization in R (1, 2, 3, and 4), as a mean of keeping organized notes on how to do it for myself but also for (hopefully) helping others to create their own plots efficiently. Since then, some tools changed and some new tools appeared and I decided it is time to update my online notes.

So let’s start with Cytoscape.

I am a big fan of plotting networks in Cytoscape directly from R and I have to admit that I kept a copy of old Cytoscape version just to be able to use CytoscapeRPC and RCytoscape. While some other network visualization tools may provide somewhat fancier network plots, the ability to see changes in visualization after running each command was a feature that made a difference for me and made me use Cytoscape more than other similar tools.

Starting with version 3, Cytoscape does not support CytoscapeRPC plugin. Instead, the recommended way to communicate with the new Cytoscape from R (and other scripting languages) is through the cyREST API. If you’re interested in network visualization with cyREST, I recommend you to check the cyRest R GitHub page that provides some code examples and required utility files.

If you have a new version of Cytoscape but don’t want to think about details of how R and Cytoscape communicate, then the RCy3 package is the tool you want. Although the name is different, this package is actually a replacement for the old RCytoscape package and if you look at its reference manual, you will see that the majority of functions from the RCytoscape package are available under the same name in RCy3. Given the similarity between these two packages, I decided to use my old R script (plotNetworksRcytoscape.R) but with the RCy3 functions instead of those from RCytoscape, and with the new Cytoscape (v. 3.4.0; cyREST version: 3.3.4).

Probably the first change that you’ll notice when working with RCy3 is that, conversely to the CytoscapeRPC plugin, you don’t need to go to Cytoscape and start cyRest manually – it starts automatically when you start Cytoscape; this feaure may solve some of the “why is this suddenly not working?” questions we occasionally asked ourselves.
Another novelty is that RCy3 does not require users to call the redraw function after a new visualization property was applied, allowing users to see changes in the network plots as the command is running.
If you liked to use images as nodes, you may be disappointed because RCy3 does not provide an option to do it directly from R anymore; instead, the images need to be separately loaded into the Cytoscape’s Image Manager, then RCy3 can be used to assign them to a node by specifying their positions in the Image Manager.

I’ll discuss the other changes as I go, but let’s first review the example I was using. The example is based on a weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (from D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993) that consists of 77 nodes, corresponding to characters, and 254 weighted edges, corresponding to the number of characters coappearances in the same chapter of the book. I used four properties to characterize this network (for the sole purpose of making visualization more interesting) – the network nodes were characterized with two properties: degree and betweenness centrality, and the network edges were characterized with two properties: weight and Dice similarity (to see more details about these properties, see Network Visualization part 1 blog).

For network manipulation and calculation of network properties, I used the igraph package.

The Les Miserables network (LesMiserables.txt) is given in the three columns edge list format (column 1 = character 1, column 2 = character 2, column 3 = number of co-appearances between characters 1 and 2), so I used the graph.data.frame command to create a network from the data frame and the simplify command to ensure that all edges in the network are unique and that there are no self loops. After the network was created, node and edges properties were calculated and assigned to the corresponding nodes and edges with set.vertex.attribute and set.edge.attribute commands, respectively. Once this is done, we can proceed with the RCy3 part.

RCy3 requires that networks are in the form of graphNEL objects, so we first need to convert our network (let’s call it gD) from the igraph to graphNEL format :
gD.cyt < - igraph::as_graphnel(gD)

We can check if the node and edge attributes were passed as follows:
graph::nodeData(gD.cyt, igraph::V(gD)$name, 'degree')
graph::nodeData(gD.cyt, igraph::V(gD)$name, 'betweenness')
graph::edgeData(gD.cyt, as.character(dataSet.ext$V1), as.character(dataSet.ext$V2), 'weight')
graph::edgeData(gD.cyt, as.character(dataSet.ext$V1), as.character(dataSet.ext$V2), 'similarity')

The attributes should be there.

In RCytoscape workflow, creating node and edge attributes in igraph object and then converting the igraph object to a graphNEL object,ensured that the values of attributes are passed directly from igraph to graphNEL to Cytoscape. However, in RCy3, this approach does not work - the attributes are passed from igraph to graphNEL, but additional procedure is require to send the attributes to Cytoscape (attribute names will be listed, but the values would not be there).

RCy3 provides two sets of functions to work with node and edge attributes: setNodeAttributes/setEdgeAttributes that transfers the specified node/edge attributes, for all nodes/edges, and setNodeAttributesDirect/setEdgeAttributesDirect that transfers the node/edge attributes for the specified nodes/edges. Currently, only the second functions work for networks that are created from data frames (see RCy3 GitHub issues page for more details).

It seems that the current version of RCy3 is very sensitive to the way the network is created - I also noticed a difference in the plots with the same layouts for networks created from data frames and igraph object and networks created from data frames using the RCy3 function cyPlot that creates networks from data frames (see Figure 1). This is definitely something to keep in mind (and check) when deciding which format to use.

Figure 1: Variations in the layout based on the network source - a: network created transferring network data from igraph object (initial run); b: network created using the cyPlot function.

Once we ensured that node and edges attributes are set, we can decide on the layout. RCy3 layout are somewhat different than those available in RCytoscape, so be sure to run the following commands to get the list of available layouts:
cy < - RCy3::CytoscapeConnection()
hlp < -RCy3::getLayoutNames(cy)
as well as:
getLayoutPropertyNames(cy, hlp[10])
to get the list of layout properties (in this case, for the layout number 10, the Fruchterman-Rheingold" layout.

Once we decided on a layout (let's say #10), we can set it as:
RCy3::setLayoutProperties (gDCW, hlp[10], list (gravity_multiplier = 'similarity', nIterations = 1000))
RCy3::layoutNetwork(gDCW, hlp[10])

Finally, we can define default visualization parameters (background color, edge line color, width, node shape, etc...) and node and edge rules. These commands remained the same as in the RCytoscape version. However, there still seem to be some issues with plotting . While I got "Successfully set rule" message for all node and edge rules I used, the requested visualizations were applied only for nodes, while edge rules either did not apply or changed edge color to white. I am not sure what was the cause, as the examples provided with the package worked fine, so hopefully this issue will be solved soon.

Here is the full code:

############################################################################################
############################################################################################
# Plotting networks in R - an example how to plot a network and 
# customize its appearance in Cytoscape directly from R using 
# the RCy3 package
############################################################################################
############################################################################################
# Clear workspace 
# rm(list = ls())
############################################################################################

# Read a data set. 
# Data format: dataframe with 3 variables; variables 1 & 2 correspond to interactions; variable 3 is weight of interaction
dataSet <- read.table("lesmis.txt", header = FALSE, sep = "\t")

# Create a graph. Use simplyfy to ensure that there are no duplicated edges or self loops
gD <- igraph::simplify(igraph::graph.data.frame(dataSet, directed=FALSE))

# Print number of nodes and edges
# igraph::vcount(gD)
# igraph::ecount(gD)

############################################################################################
# Calculate some node properties and node similarities that will be used to illustrate 
# different plotting abilities

# Calculate degree for all nodes
degAll <- igraph::degree(gD, v = igraph::V(gD), mode = "all")

# Calculate betweenness for all nodes
betAll <- igraph::betweenness(gD, v = igraph::V(gD), directed = FALSE) / (((igraph::vcount(gD) - 1) * (igraph::vcount(gD)-2)) / 2)
betAll.norm <- (betAll - min(betAll))/(max(betAll) - min(betAll))
rm(betAll)

#Calculate Dice similarities between all pairs of nodes
dsAll <- igraph::similarity.dice(gD, vids = igraph::V(gD), mode = "all")

############################################################################################
# Add new node and edge attributes based on the calculated node properties/similarities

gD <- igraph::set.vertex.attribute(gD, "degree", index = igraph::V(gD), value = degAll)
gD <- igraph::set.vertex.attribute(gD, "betweenness", index = igraph::V(gD), value = betAll.norm)

# Check the attributes
# summary(gD)

F1 <- function(x) {data.frame(V4 = dsAll[which(igraph::V(gD)$name == as.character(x$V1)), which(igraph::V(gD)$name == as.character(x$V2))])}
dataSet.ext <- plyr::ddply(dataSet, .variables=c("V1", "V2", "V3"), function(x) data.frame(F1(x)))

gD <- igraph::set.edge.attribute(gD, "weight", index = igraph::E(gD), value = 0)
gD <- igraph::set.edge.attribute(gD, "similarity", index = igraph::E(gD), value = 0)

# The order of interactions in dataSet.ext is not the same as it is in dataSet or as it is in the edge list
# and for that reason these values cannot be assigned directly

for (i in 1:nrow(dataSet.ext))
{
  igraph::E(gD)[as.character(dataSet.ext$V1) %--% as.character(dataSet.ext$V2)]$weight <- as.numeric(dataSet.ext$V3)
  igraph::E(gD)[as.character(dataSet.ext$V1) %--% as.character(dataSet.ext$V2)]$similarity <- as.numeric(dataSet.ext$V4)
}

# Check the attributes
# summary(gD)

rm(dataSet,dsAll, i, F1)

############################################################################################
# Now, let's do Cytoscape plots

# First, we need to transform our network from the igraph to graphnel format
gD.cyt <- igraph::as_graphnel(gD)

# Check if attributes have been passed
# graph::nodeData(gD.cyt, igraph::V(gD)$name, 'degree')
# graph::nodeData(gD.cyt, igraph::V(gD)$name, 'betweenness')
# graph::edgeData(gD.cyt, as.character(dataSet.ext$V1), as.character(dataSet.ext$V2), 'weight')
# graph::edgeData(gD.cyt, as.character(dataSet.ext$V1), as.character(dataSet.ext$V2), 'similarity')

# We have to create attributes for graphNEL
# We'll keep the same names as before
# In RCytoscape, this would ensure that the values of attributes are passed directly from igraph.
# However, this does not work with RCy3 right now (not sure if it is a bug or a feature has changed).
# Thus, we need to do send attributes to Cytoscape

gD.cyt <- RCy3::initNodeAttribute(gD.cyt, 'degree', 'numeric', 0) 
gD.cyt <- RCy3::initNodeAttribute(gD.cyt, 'betweenness', 'numeric', 0) 
gD.cyt <- RCy3::initEdgeAttribute (gD.cyt, "weight", 'integer', 0)
gD.cyt <- RCy3::initEdgeAttribute (gD.cyt, "similarity", 'numeric', 0)

# Next, we will create a new graph window in cytoscape
gDCW <- RCy3::CytoscapeWindow("Les Miserables", graph = gD.cyt, overwriteWindow = TRUE)

# We can display graph, with defaults color/size scheme
RCy3::displayGraph(gDCW)

# Now let's send/load node and edge attributes into Cytoscape

##########
# This should theoretically work, but there are some problems with attributes when networks
# are created from data frames (see https://github.com/tmuetze/Bioconductor_RCy3_the_new_RCytoscape/issues/25)
# I'll keep this code uncommented, but right now, it doesn't do anything

# setNodeAttributes should transfer the specified node attributes, for all nodes, the named node attribute 
# from the R graph (found in obj@graph) to Cytoscape. 
attribute.names <- RCy3::noa.names(gDCW@graph)

# Print list of attribute names to see if they are ok
# attribute.names 

# All nodes should already be in
RCy3::sendNodes(gDCW)

for (attribute.name in attribute.names){
  RCy3::setNodeAttributes(gDCW, attribute.name)
}

attribute.names <- RCy3::eda.names(gDCW@graph)

# All edges should already be in
RCy3::sendEdges(gDCW)

for (attribute.name in attribute.names){
  RCy3::setEdgeAttributes(gDCW, attribute.name)
}

RCy3::displayGraph(gDCW)

##########
# Thealternative, when we set attributes directly, works fine,
# so we will use it now (although, it seems kind of repetative)

RCy3::setNodeAttributesDirect(gDCW, 'degree', 'numeric', igraph::V(gD)$name, igraph::V(gD)$degree)
RCy3::setNodeAttributesDirect(gDCW, 'betweenness', 'numeric', igraph::V(gD)$name, igraph::V(gD)$betweenness)
RCy3::setEdgeAttributesDirect(gDCW, 'weight', 'integer', as.character (RCy3::cy2.edge.names (gDCW@graph)), graph::edgeData(gD.cyt, as.character(dataSet.ext$V1), as.character(dataSet.ext$V2), 'weight'))
RCy3::setEdgeAttributesDirect(gDCW, 'similarity', 'numeric', as.character (RCy3::cy2.edge.names (gDCW@graph)), graph::edgeData(gD.cyt, as.character(dataSet.ext$V1), as.character(dataSet.ext$V2), 'similarity'))

##########
# Now let's decide on a layout

# If you also want to choose a layout from R, a list  of available layouts can be accessed as follow:
cy <- RCy3::CytoscapeConnection()
hlp <-RCy3::getLayoutNames(cy)

# We'll select the "fruchterman-rheingold" layout. This layout is the layout number 10 
# To see properties for the given layout, use:
# RCy3::getLayoutPropertyNames(cy, hlp[10])
# We can choose any property we want and provide them as a list
RCy3::setLayoutProperties (gDCW, hlp[10], list (gravity_multiplier = 'similarity', nIterations = 1000))
RCy3::layoutNetwork(gDCW, hlp[10])

# I've noticed that if I change property to attraction_multiplier
# RCy3::setLayoutProperties (gDCW, hlp[10], list (attraction_multiplier = 'similarity', nIterations = 1000))
# RCy3::layoutNetwork(gDCW, hlp[10])
# And then go back to the original one
# RCy3::setLayoutProperties (gDCW, hlp[10], list (gravity_multiplier = 'similarity', nIterations = 1000))
# RCy3::layoutNetwork(gDCW, hlp[10])
# The layout won't go back to the original one. I am not sure if this is a bug or not

##########
# Finally, we can define rules for nodes:
RCy3::setNodeColorRule(gDCW, 'degree', c(min(degAll), mean(degAll), max(degAll)), c('#F5DEB3', '#FFA500', '#FF7F50', '#FF4500', '#FF0000'), mode = 'interpolate')
RCy3::setNodeSizeRule(gDCW, 'betweenness', c(min(betAll.norm), mean(betAll.norm), max(betAll.norm)), c(30, 45, 60, 80, 100), mode = 'interpolate')

# And edges:
RCy3::setEdgeLineWidthRule(gDCW, 'weight', dataSet.ext$V3, dataSet.ext$V3)
RCy3::setEdgeColorRule(gDCW, 'weight', c(min(as.numeric(dataSet.ext$V3)), mean(as.numeric(dataSet.ext$V3)), max(as.numeric(dataSet.ext$V3))), c('#FFFF00', '#00FFFF', '#00FF7F', '#228B22', '#006400'), mode='interpolate')

# While I get the "Successfully set rule" for both of the Edge rules, the view in the Cytoscape did not 
# change accordning the rules - setEdgeLineWidthRule command did not make any changes and the 
# setEdgeColorRule command made all edges white.
# One of the GitHub solved issues suggests to first set all rule-based functions and then the direct ones, but
# but it didn't work here (https://github.com/tmuetze/Bioconductor_RCy3_the_new_RCytoscape/issues/21 and
# https://github.com/tmuetze/Bioconductor_RCy3_the_new_RCytoscape/issues/20)

# We will define our own default color/size schema after we defined node and edge rules, due to
# possible issues when using rules
RCy3::setDefaultBackgroundColor(gDCW, '#D3D3D3')
RCy3::setDefaultEdgeColor(gDCW, '#CDC9C9')
RCy3::setDefaultEdgeLineWidth(gDCW, 4)
RCy3::setDefaultNodeBorderColor(gDCW, '#000000')
RCy3::setDefaultNodeBorderWidth(gDCW, 3)
RCy3::setDefaultNodeShape(gDCW, 'ellipse')
RCy3::setDefaultNodeColor(gDCW, '#87CEFA')
RCy3::setDefaultNodeSize(gDCW, 60)
RCy3::setDefaultNodeFontSize(gDCW, 20)
RCy3::setDefaultNodeLabelColor(gDCW, '#000000')

# Running these commands will set color to all edges back to black and set their width to 4,
# ignoring the rules specified above
############################################################################################

sessionInfo()


# R version 3.3.1 (2016-06-21)
# Platform: x86_64-redhat-linux-gnu (64-bit)
# Running under: Fedora 23 (Workstation Edition)
#
# locale:
# [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
# [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
# [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
# [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
# [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods  
# [7] base     
# 
# loaded via a namespace (and not attached):
# [1] httr_1.2.1          R6_2.1.2            plyr_1.8.4         
# [4] magrittr_1.5        parallel_3.3.1      tools_3.3.1        
# [7] igraph_1.0.1        RCurl_1.95-4.8      curl_1.1           
# [10] Rcpp_0.12.6         RJSONIO_1.3-0       BiocGenerics_0.18.0
# [13] RCy3_1.2.0          bitops_1.0-6        stats4_3.3.1       
# [16] graph_1.50.0 
#
#####
# Cytoscape version: 3.4.0
# Java version: 1.8.0_101
# cyREST version: 3.3.4

############################################################################################

Two more things before I finish.

The RCy3 package is available on Bioconductor and GitHub. Since the package still seems to have some issues, if you plan to use it and have an issue, it is probably a good idea to keep an eye on both, as the GitHub version contains fixes for the reported issues before the Bioconductor version (official release) does.

As I mentioned earlier, some of the RCy3 functions seem to be very sensitive to the way the network is created. The Gist associated with this post also contains code for network created using the cyPlot function (RCy3_example2.R file that uses RCy3 from Bioconductor and RCy3_example3.R file that uses RCy3 from GitHub; you will also need to source the cyPlot_mod.R function/file that contains some bugs fixed - see comments in the RCy3_example code).

To leave a comment for the author, please follow the link and comment on their blog: Fun with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.