Network visualization – part 5: Cytoscape (an update) – RCy3
Posted on August 7, 2016 by Vessy in R bloggers | 0 Comments
[This article was first published on Fun with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few years ago I wrote a series of blogs about network visualization in R (1, 2, 3, and 4), as a mean of keeping organized notes on how to do it for myself but also for (hopefully) helping others to create their own plots efficiently. Since then, some tools changed and some new tools appeared and I decided it is time to update my online notes.
So let’s start with Cytoscape.
I am a big fan of plotting networks in Cytoscape directly from R and I have to admit that I kept a copy of old Cytoscape version just to be able to use CytoscapeRPC and RCytoscape. While some other network visualization tools may provide somewhat fancier network plots, the ability to see changes in visualization after running each command was a feature that made a difference for me and made me use Cytoscape more than other similar tools.
Starting with version 3, Cytoscape does not support CytoscapeRPC plugin. Instead, the recommended way to communicate with the new Cytoscape from R (and other scripting languages) is through the cyREST API. If you’re interested in network visualization with cyREST, I recommend you to check the cyRest R GitHub page that provides some code examples and required utility files.
If you have a new version of Cytoscape but don’t want to think about details of how R and Cytoscape communicate, then the RCy3 package is the tool you want. Although the name is different, this package is actually a replacement for the old RCytoscape package and if you look at its reference manual, you will see that the majority of functions from the RCytoscape package are available under the same name in RCy3. Given the similarity between these two packages, I decided to use my old R script (plotNetworksRcytoscape.R) but with the RCy3 functions instead of those from RCytoscape, and with the new Cytoscape (v. 3.4.0; cyREST version: 3.3.4).
Probably the first change that you’ll notice when working with RCy3 is that, conversely to the CytoscapeRPC plugin, you don’t need to go to Cytoscape and start cyRest manually – it starts automatically when you start Cytoscape; this feaure may solve some of the “why is this suddenly not working?” questions we occasionally asked ourselves.
Another novelty is that RCy3 does not require users to call the redraw function after a new visualization property was applied, allowing users to see changes in the network plots as the command is running.
If you liked to use images as nodes, you may be disappointed because RCy3 does not provide an option to do it directly from R anymore; instead, the images need to be separately loaded into the Cytoscape’s Image Manager, then RCy3 can be used to assign them to a node by specifying their positions in the Image Manager.
I’ll discuss the other changes as I go, but let’s first review the example I was using. The example is based on a weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (from D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993) that consists of 77 nodes, corresponding to characters, and 254 weighted edges, corresponding to the number of characters coappearances in the same chapter of the book. I used four properties to characterize this network (for the sole purpose of making visualization more interesting) – the network nodes were characterized with two properties: degree and betweenness centrality, and the network edges were characterized with two properties: weight and Dice similarity (to see more details about these properties, see Network Visualization part 1 blog).
For network manipulation and calculation of network properties, I used the igraph package.
The Les Miserables network (LesMiserables.txt) is given in the three columns edge list format (column 1 = character 1, column 2 = character 2, column 3 = number of co-appearances between characters 1 and 2), so I used the graph.data.frame command to create a network from the data frame and the simplify command to ensure that all edges in the network are unique and that there are no self loops. After the network was created, node and edges properties were calculated and assigned to the corresponding nodes and edges with set.vertex.attribute and set.edge.attribute commands, respectively. Once this is done, we can proceed with the RCy3 part.
RCy3 requires that networks are in the form of graphNEL objects, so we first need to convert our network (let’s call it gD) from the igraph to graphNEL format : gD.cyt <- igraph::as_graphnel(gD)
We can check if the node and edge attributes were passed as follows: graph::nodeData(gD.cyt, igraph::V(gD)$name, 'degree') graph::nodeData(gD.cyt, igraph::V(gD)$name, 'betweenness') graph::edgeData(gD.cyt, as.character(dataSet.ext$V1), as.character(dataSet.ext$V2), 'weight') graph::edgeData(gD.cyt, as.character(dataSet.ext$V1), as.character(dataSet.ext$V2), 'similarity')
The attributes should be there.
In RCytoscape workflow, creating node and edge attributes in igraph object and then converting the igraph object to a graphNEL object,ensured that the values of attributes are passed directly from igraph to graphNEL to Cytoscape. However, in RCy3, this approach does not work - the attributes are passed from igraph to graphNEL, but additional procedure is require to send the attributes to Cytoscape (attribute names will be listed, but the values would not be there).
RCy3 provides two sets of functions to work with node and edge attributes: setNodeAttributes/setEdgeAttributes that transfers the specified node/edge attributes, for all nodes/edges, and setNodeAttributesDirect/setEdgeAttributesDirect that transfers the node/edge attributes for the specified nodes/edges. Currently, only the second functions work for networks that are created from data frames (see RCy3 GitHub issues page for more details).
It seems that the current version of RCy3 is very sensitive to the way the network is created - I also noticed a difference in the plots with the same layouts for networks created from data frames and igraph object and networks created from data frames using the RCy3 function cyPlot that creates networks from data frames (see Figure 1). This is definitely something to keep in mind (and check) when deciding which format to use.
Figure 1: Variations in the layout based on the network source - a: network created transferring network data from igraph object (initial run); b: network created using the cyPlot function.
Once we ensured that node and edges attributes are set, we can decide on the layout. RCy3 layout are somewhat different than those available in RCytoscape, so be sure to run the following commands to get the list of available layouts: cy <- RCy3::CytoscapeConnection() hlp <-RCy3::getLayoutNames(cy)
as well as: getLayoutPropertyNames(cy, hlp[10])
to get the list of layout properties (in this case, for the layout number 10, the Fruchterman-Rheingold" layout.
Once we decided on a layout (let's say #10), we can set it as: RCy3::setLayoutProperties (gDCW, hlp[10], list (gravity_multiplier = 'similarity', nIterations = 1000)) RCy3::layoutNetwork(gDCW, hlp[10])
Finally, we can define default visualization parameters (background color, edge line color, width, node shape, etc...) and node and edge rules. These commands remained the same as in the RCytoscape version. However, there still seem to be some issues with plotting . While I got "Successfully set rule" message for all node and edge rules I used, the requested visualizations were applied only for nodes, while edge rules either did not apply or changed edge color to white. I am not sure what was the cause, as the examples provided with the package worked fine, so hopefully this issue will be solved soon.
Here is the full code:
Two more things before I finish.
The RCy3 package is available on Bioconductor and GitHub. Since the package still seems to have some issues, if you plan to use it and have an issue, it is probably a good idea to keep an eye on both, as the GitHub version contains fixes for the reported issues before the Bioconductor version (official release) does.
As I mentioned earlier, some of the RCy3 functions seem to be very sensitive to the way the network is created. The Gist associated with this post also contains code for network created using the cyPlot function (RCy3_example2.R file that uses RCy3 from Bioconductor and RCy3_example3.R file that uses RCy3 from GitHub; you will also need to source the cyPlot_mod.R function/file that contains some bugs fixed - see comments in the RCy3_example code).
Related
To leave a comment for the author, please follow the link and comment on their blog: Fun with R.