The network plot of Mutations

[This article was first published on My Data Science Journey, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a pet project, I created a network plot in R, to represent mutations and how combinations improved or worsened a mutation. I have tried to document the way I approached this whole problem in this post.


First let’s look at the input data.
An excel sheet with a column of mutations and a column of the Half Life Improvement factors would do for input.

A1B B2C6
A1B B2C C3D3
C3D Z25A7

Since the inputs I had were in xlsx format, I used the XLConnect package to read and write from it.

I had to write some code to clean up the data. For example, sometimes the mutations were separated by ‘+’ instead of a single space and so on. One might have an input file with a lot of irrelevant information, or duplicates.
Many small functions were need to clean this according to the various input file.

Creating Nodes

Then I had to create “nodes” from the list of mutations. For this the code involved getting unique records of “mutations” and I also added a bit of code to count the number of substitutions in the mutation. Now the table would look something like:

A1B B2C62
A1B B2C C3D33
C3D Z25A72

Creating Edges

Now that we have the nodes, we need to make their “edges” or “links”.
Looping through the number of substitutions, I sorted the data by number of substitutions, and then further looping through the mutations, made connections by checking for matching mutations.
Also, I decided to use the “networkD3” package in R, so I need to convert the mutations to a number, and edges defined as “source” and “target”, also as numbers.

Now NetworkD3 is based on d3js. And this being java based, the numbering should start from 0.

Our nodes would now look like:
1A1B B2C62
2A1B B2C C3D33
3C3D Z25A72

And the edges would look like:
01A1BA1B B2C62
02A1BA1B B2C C3D33
33C3D Z25AC3D Z25A72

You may want to save this in an excel, with sheets named Node and Edges respectively.

Plotting the graph

As mentioned earlier, I used the networkD3 package.  And in that the forceNetwork function. This has a lot more options for effect and hence I used it in my project. There are other types of visualization available under networkD3, all based on the D3.js.
 fn <- forceNetwork(Links = links, Nodes = nodes,  
           Source = "Source", Target = "ID", Value = "NSub",  
           NodeID = "Mutation",   
           Nodesize = "HIF", Group = "group",   
           zoom = T, bounded = F, legend = T,  
           opacity = 0.8,  
           fontSize = 16,  
           width = 1600, height = 1200  

The output was then saved as an HTML file for sharing with end users.

Customizing the results

Then I started needing customized features in the visualization. I found this link giving ideas for a few, and using it as inspiration added a search box among other things.

One can use HTML::onRender to add the javascript code, but what I did instead was to find the package file directly at /usr/local/lib/R/site-library/networkD3/htmlwidgets/ and edited it on sudo mode. To repackage, I used the command:

 sudo R CMD INSTALL /usr/local/lib/R/site-library/networkD3  

The html code for adding a search box was added to the R code itself, using the browsable tag. I got help for this part, from a question I asked on stack overflow.

The code for adding the search:

 fn <- forceNetwork(Links = links, Nodes = nodes,   
       Source = "Source", Target = "ID", Value = "NSub",   
       NodeID = "Mutation",    
       Nodesize = "HIF", Group = "group",    
       zoom = T, bounded = F, legend = T,   
       opacity = 0.8,   
       fontSize = 16,   
       width = 1600, height = 1200   
     '<script src=""></script>  
     <script src=""></script>  
     <style type="text/css">  
      #modal {  
        background: white;  
        border: 1px black solid;  
        box-shadow: 10px 10px 5px #888888;  
        display: none;  
      #content {  
        max-height: 400px;  
        overflow: auto;  
      #modalClose {  
        position: absolute;  
        top: -0px;  
        right: -0px;  
        z-index: 1;  
     <script type="text/javascript">  
      function closeButton() {"#modal").style("display","none");  
     <div class="ui-widget">  
      <input id="search">  
      <button type="button">Search</button>  
      <select id="hif-comp">  
       <option value="lt"><</option>  
       <option value="gt">></option>  
      <input id="hif">  
      <button type="button" id="smartSearch">SmartSearch</button>  
     <div id="modal">  
      <div id="content"></div>  
      <button id="modalClose" onclick="closeButton();">X</button>  

Also included is html code for an information box that opens when a node is clicked. The file now begins to look like:

Single clicking a node gives a box with information, double clicking or searching a node highlights it and it's immediate neighbors.

Typical use of this would be by protein designers, who would be able to then see how the substitutions have been working and what direction they can make further substitutions to get the molecule they desire.

There is a lot more that can be done to improve this, but for now, this helps.

To leave a comment for the author, please follow the link and comment on their blog: My Data Science Journey. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)