In this month’s we are going to look at data analysis and visualization of social networks using R programming.
Friendster Networks Mapping
Friendster was a yesteryear social media network, something akin to Facebook. I’ve never used it but it is one of those easily available datasets where you have a list of users and all their connections. So it is easy to create a viz and look at whose networks are strong and whose are weak, or even the bridge between multiple networks.
The dataset and code files are added on the Projects Page here , under “social network viz”.
For this analysis, we will be using the following library packages:
- Load the datafiles. The list of users is given in the file named “nodes” as each user is a node in the graph. The connection list is given in the file named “edges” as a 1-to-1 mapping. So if user Miranda has 10 friends, there would be 10 records for Miranda in the “edges” file, one for each friend. The friendster datafile has been anonymized, so there are numbers (id) rather than names.
- Convert the dataframes into a very specific format. We do some prepwork so that we can directly use the graph visualization functions.
- Create a graph object. This will also help to create clusters. Since the dataset is anonymized it might seem irrelevant, but imagine this in your own social network. You might have one cluster of friends who are from your school, another bunch from your office, one set who are cousins and family members and some random folks. Creating a graph object allows us to look at where those clusters lie automatically.
- Visualize using functions specific to graph objects. The first function is visNetwork() which generates an interactive color coded cluster graph. When you click on any of th nodes (colored circles), it will highlight all the connections radiating from the node. (In the image below, I have highlighted node for user 17.
- You can also use the same function with a bunch of different parameters, as shown below:
visNetwork(nodeset, links2, width = "100%") %>% visIgraphLayout() %>% visNodes( shape = "dot", color = list( background = "#0085AF", border = "#013848", highlight = "#FF8000" ), shadow = list(enabled = TRUE, size = 10) ) %>% visEdges( shadow = FALSE, color = list(color = "#0085AF", highlight = "#C62F4B") ) %>% visOptions(highlightNearest = list(enabled = T, degree = 1, hover = T)) %>% visLayout(randomSeed = 11)
In the image below you can see the 3 colored clusters and the central (light blue) node. The connections in blue are the ones that do not have a lot of direct connections. The yellow and red clusters are tigher, indicating they have internal connections with each other. (similar to a bunch of classmates who all know each other)
That’s it. Again the code is available on the Projects Page.
Feel free to play around with the code. One extensions of this idea would be to download Facebook or LinkedIn data (premium account needed) and create similar visualizations.
Or if you have a list of airports and routes, you could create something like this as a flight network map, to know the minimum number of hops between 2 destinations and alternative routes.
You could also do a counter to see which nodes have the most number of friends and increase the size of the circle. This would make it easier to view which nodes are the most well-connected.
Of course, do not be over-mesmerized by the data. In real-life, the strength of the relationship also matters. This is hard to quantify or collect, even though its easy to depict once you have the data in hand/ For example, I have a 1000 connections who I’ve met at conferences or random events. If I needed a job, most may not really be useful. But my friend Sarah has only 300 but super-loyal friends who literally found her a job in 2 days when she had to move back to her hometown to take care of a sick parent.
With that thought, do take a look at the code and have fun coding!