DFIR Redefined Part 3: visNetwork for Network Data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In keeping with pending presentations for the Secure Iowa Conference and (ISC)2 Security Congress, I’m continuing the DFIR Redefined: Deeper Functionality for Investigators with R series (see Part 1 and Part 2). Incident responders and investigators, faced with an inundation of data and ever-evolving threat vectors, require skills enhancements and analytics optimization. DFIR Redefined is intended to explore such opportunities to create efficiencies and help the blue team cause. visNetwork represents another fine example of visualizing datasets in a manner that analysts can naturally gravitate towards.
Inspired by the STATWORX writeup on visNetwork, I immediately envisioned using it for malicious IP network activity.
Imagine the following scenario.
You’re the security lead for a midsize financial services firm operating six total sites. The network design is inadequate; while there are six unique sites the topology is mesh-like. The intential design serves two purposes, one positive and one deeply problematic. While collaboration and node cooperation are inherent, so to is the ease of malware to propogate rapidly accross the whole topology. You, as the security punching bag, have dealt with a number of malware incidents prior, but now you’re facing a real cluster. Emotet is in the house. Emotet, malware originally designed as a banking Trojan aimed at stealing financial data, has evolved to become a major threat. As of 2018, new versions of the Emotet Trojan include the ability to install other malware to infected machines, including other Trojans and ransomware. More succinctly, Emotet, per US-CERT NCAS alert TA18-201A, includes worm-like features result in rapidly spreading network-wide infection, which are difficult to combat. This is exactly where you find yourself in your incident response, and you need to rapidly identify impacted nodes, contain, and mitigate.
You have data.
Your asset inventory is current, as it should be, and your network topology, albeit suboptimal, is well documented. You have logs. Via network flow aggregation you have good raw data regarding what nodes are communicating with each other, and to what extent (volume, frequency), referred to as width in the CSVs to be ingested. Raw data is nice, a must have, but here exists a golden opportunity for network visualization…of your network. You have what is the required data to compile a list of nodes, and a list of edges to incorporate directly into a visNetwork visualization that should more rapidly help you identify command and control (C2) nodes, and others that are falling to the outbreak.
Again, thanks to Niklas Junker at STATWORX for the stimulus here. This is a complete and unadulterated resuse of his code and excellent writeup. The complete R script as well as the nodes and edges CSVs are posted on my GitHub for your own use and experimentation. A walkthrough in snippets follows:
# Remove all the objects from the workspace (clear the chaff), and set the working directory rm(list = ls()) setwd("c:/coding/R/visNetwork") #Load the required packages library(dplyr) library(visNetwork) library(geomnet) library(igraph) # Data Preparation #Load dataset # Load nodes data from CSV nodeData <- read.csv("nodes.csv", header = TRUE) nodes <- as.data.frame(nodeData) # Load edges from CSV edgeData <- read.csv("edges.csv", header = TRUE) edges <- as.data.frame(edgeData) # Create graph for Louvain Community Detection (LCD) # https://arxiv.org/pdf/0803.0476.pdf graph <- graph_from_data_frame(edges, directed = FALSE) #Louvain Community Detection (LCD) cluster <- cluster_louvain(graph) cluster_df <- data.frame(as.list(membership(cluster))) cluster_df <- as.data.frame(t(cluster_df)) cluster_df$label <- rownames(cluster_df) #Create group column nodes <- left_join(nodes, cluster_df, by = "label") colnames(nodes)[3] <- "group" # Visualize data with visNetwork visNetwork(nodes, edges)
Take note of the reference to Louvain Community Detection (LCD), that’s the algorithmic underpinning for igraph, you should read the framing paper, Fast unfolding of communities in large networks
The result is beautiful, as seen in Figure 1.
When you render this for yourself you’ll note that you can drag nodes in case you need to read a label it’s hiding for another node. While that is dynamic in part, the real action ensues when you customize your network view with additional functions as we’ll see in the next snippet.
Above all else, consider how the above mentioned width drives specific behavior in the graph. The more a given node communicates with another, the wider the representing edge will be visualized. This leads us to possible conclusions in the example. Referring to Figure 2, a zoomed view into Figure 1, it is reasonable to assume that three nodes in particular may be operating as C2 in the Emotet outbreak: 172.17.12.22, 172.17.12.30, and 192.168.22.46.
With additional functionality as mentioned above, you can create even more dynamic views. Code follows:
visNetwork(nodes, edges, width = "100%") %>% visIgraphLayout() %>% visNodes( shape = "dot", color = list( background = "#0085AF", border = "#013848", highlight = "#FF8000" ), shadow = list(enabled = TRUE, size = 10) ) %>% visEdges( shadow = FALSE, color = list(color = "#0085AF", highlight = "#C62F4B") ) %>% visOptions(highlightNearest = list(enabled = T, degree = 1, hover = T), selectedBy = "group") %>% visLayout(randomSeed = 11)
As noted, use the likes of visNodes, visEdges, visOptions, visLayout or visIgraphLayout to enhance the visualization as seen in Figure 2.
Most importantly, note that visOptions is used to highlight nodes resulting in the ability to select by group. The logical groupings in this example represent each of the six financial services locations, and the Emotet-impacted nodes on their networks. The resulting Select by group provides highlighted focus of a particular site’s network. If you’re deploying incident responders in person, or implementing remote mitigation, such views create efficients and improved time-to-mitigate (TTM). A focus on group 4 (site 4) highlights two of the above mentioned C2 nodes.
To apply this practice, you’d need to devise nuance flow reporting on node-to-node communications inclusive of count over a given period. You could tailor by specific protocols and traffic types depending on the question you’re trying to answer in the data. More to related experiments to come in Part 4 of DFIR Redefined series.
Cheers…until next time.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.