In a pet project, I created a network plot in R, to represent mutations and how combinations improved or worsened a mutation. I have tried to document the way I approached this whole problem in this post.
First let’s look at the input data.
An excel sheet with a column of mutations and a column of the Half Life Improvement factors would do for input.
|A1B B2C C3D||3|
Since the inputs I had were in xlsx format, I used the XLConnect package to read and write from it.
I had to write some code to clean up the data. For example, sometimes the mutations were separated by ‘+’ instead of a single space and so on. One might have an input file with a lot of irrelevant information, or duplicates.
Many small functions were need to clean this according to the various input file.
Then I had to create “nodes” from the list of mutations. For this the code involved getting unique records of “mutations” and I also added a bit of code to count the number of substitutions in the mutation. Now the table would look something like:
|A1B B2C C3D||3||3|
Now that we have the nodes, we need to make their “edges” or “links”.
Looping through the number of substitutions, I sorted the data by number of substitutions, and then further looping through the mutations, made connections by checking for matching mutations.
Also, I decided to use the “networkD3” package in R, so I need to convert the mutations to a number, and edges defined as “source” and “target”, also as numbers.
Now NetworkD3 is based on d3js. And this being java based, the numbering should start from 0.
Our nodes would now look like:
|2||A1B B2C C3D||3||3|
And the edges would look like:
|0||2||A1B||A1B B2C C3D||3||3|
|3||3||C3D Z25A||C3D Z25A||7||2|
You may want to save this in an excel, with sheets named Node and Edges respectively.
Plotting the graph
As mentioned earlier, I used the networkD3 package. And in that the forceNetwork function. This has a lot more options for effect and hence I used it in my project. There are other types of visualization available under networkD3, all based on the D3.js.
fn <- forceNetwork(Links = links, Nodes = nodes, Source = "Source", Target = "ID", Value = "NSub", NodeID = "Mutation", Nodesize = "HIF", Group = "group", zoom = T, bounded = F, legend = T, opacity = 0.8, fontSize = 16, width = 1600, height = 1200 )
The output was then saved as an HTML file for sharing with end users.
Customizing the results
Then I started needing customized features in the visualization. I found this link giving ideas for a few, and using it as inspiration added a search box among other things.
sudo R CMD INSTALL /usr/local/lib/R/site-library/networkD3
The code for adding the search:
fn <- forceNetwork(Links = links, Nodes = nodes, Source = "Source", Target = "ID", Value = "NSub", NodeID = "Mutation", Nodesize = "HIF", Group = "group", zoom = T, bounded = F, legend = T, opacity = 0.8, fontSize = 16, width = 1600, height = 1200 ) browsable( tagList( tags$head( tags$link( href="http://code.jquery.com/ui/1.11.0/themes/smoothness/jquery-ui.css", rel="stylesheet" ) ), HTML( 'HIF' ), fn )
Also included is html code for an information box that opens when a node is clicked. The file now begins to look like:
Single clicking a node gives a box with information, double clicking or searching a node highlights it and it's immediate neighbors.
Typical use of this would be by protein designers, who would be able to then see how the substitutions have been working and what direction they can make further substitutions to get the molecule they desire.
There is a lot more that can be done to improve this, but for now, this helps.