As part of my on-going research simulating network structure using graph motifs I have been collecting novel data sets to test and benchmark the method. Since I am a political scientist studying conflict, it was suggested to me to collect a co-authorship network within this sub-discipline. Such a network is useful for several reasons; for example, it is a census of all possible ties, i.e., there are no missing or hidden edges among nodes because the network “is what it is” based on authorships. This makes testing and benchmarking much easier to interpret. In addition, while co-authorship networks are common in other disciplines, no such data exists for political science, or conflict studies more specifically (to my knowledge). Finally, it is just plain fun to have data wherein your colleagues are the units!
Previously, collecting such data was an arduous, hand-coded, task. I do not get down on hand-coding, but luckily I do not have to. The Social Science Research Network is an outstanding resource for working papers across all of the social sciences—including political science. Of particular interest for me is the Conflict Studies eJournal, which aggregates and organizes new papers in this area. The SSRN also has a convenient coding schema for both articles and authors, which is all that is needed to quickly generate a co-authorship network from their database of articles.
Above is a visualization of this network data generated in Gephi and hosted at Microsoft’s zoom.it (go full-screen to really explore). The nodes are colored by type, with authors in red and articles in blue. They are also sized by their relative PageRank, which is slightly less helpful in this case because there are so many disconnected components. Interestingly, there are several duplicate entries in the SSRN database such that an author-article relation can may be present up to 14 times. I am afraid this is due to some error within the SSRN data, but to mitigate it I have collapsed those relationships and created a weighted graph, which is actually what is used to generate the above visualization. The edge thickness reflects this slightly, though the vast majority of edges have a weighting of one.
Depending on your perspective, you might view this illustration as the “Conflict Studies Supernova,” or perhaps the “Moment of Conception for Conflict Studies;” either way, the visualization provides evidence of the scale and density of co-authorship relationships in this sub-discipline.
Some quick facts about the network, there 5,234 nodes and 4,240 edges. It contains 1,406 weakly connected components, the largest of which contains 1,157 nodes (pictured at the center). It is extremely sparse, with a density well below 1%, and even isolating the main component only yields a density of 0.001. Given this sparsity, the average degree is actually less then one, at about 0.8 ties. Clearly there is room for much more collaboration among conflict studies scholars within political science!
The network was generated in R using the XML and igraph packages primarily, with a bit of data slicing in plyr. You can download the data and code on my github repository (code sharing, FTW!), and are welcome to play around with it as you like. The node labels in the data correspond to author and abstract IDs, which you can look-up at the SSRN website. Keep in mind, the data online only represents the network as on last night (2010-11-09), and any papers added today or later will not be present. Of course, you are welcome to re-run the code to update the data for your pruposes.
I’ll be spending the next several weeks with this and other data, so I hope you have as much fun with it as I plan to!