Patterns in the Ivy II: Beyond the Giant Component
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week’s post on the metal collaboration network brought attention largely to the “giant component”–the largest subgraph in a network where all actors have at least one path to all other actors. In large networks, even sparse ones, giant components typically emerge and include the majority of actors in the network. While focusing on the giant component follows conventional practice while analyzing small world networks, perhaps worthwhile information can be inferred from actors outside of the giant component.
One key finding from the previous post was that the giant components produced by the simulations were consistently larger than the observed giant component in the data, including 1885 to 1954 bands in 95% of the simulations, while the observed giant component had just 1447 bands. While this discrepancy likely results from the nonrandom clustering observed in the data, there may exist other explanations. Here, I explore three possibilities for why the giant component did not subsume some of the components: general tie probability, data issues, and exclusivity.
General tie probability. Giant components emerge as a function of tie probability. As the number of edges increases relative to the number of nodes, the probability of observing a giant component increases. Should a network feature a giant component, this component grows as more ties form, subsuming more actors. In the presence of skewed degree distributions, where relatively few actors possess a disproportionate amount of connections, the giant component grows faster: less popular actors are more prone to form ties to popular actors and these popular actors are likely already within the giant component.
If we’re to conceptualize collaboration as a dynamic process, tie formation comes with age and experience within the network. Younger (or short-lived) musicians and bands, as well as those bands who have not experienced lineup changes, have had fewer opportunities to create edges. For instance, the young band Cooked X was active from 2007 to 2011, successful enough to meet Wikipedia’s notoriety clause, shares ties to no other band in the dataset and dissolved largely due to two members quitting the band to finish high school. Given the age of the musicians and the relative recency of the band’s breakup, Crooked X has not had the opportunity to form ties to other bands.
Data issues. Despite the many benefits of crowdsourced data–including accessibility and scope–it is no panacea. To illustrate, let’s consider some of the bands who are in the largest sixteen non-giant components, accounting for 2% of the bands and 4% of the musicians in the network.
|3||46||Darkseed; Lacrimas Profundere; Equilibrium|
|8||35||Eighteen Visions; I Killed the Prom Queen; Bleeding Through; Throwdown; Bring Me The Horizon; Death by Stereo; Bury Your Dead; Adamantium|
|4||36||Bloodrock; We Came as Romans; Abandon All Ships; Woe, Is Me|
|4||32||Greeley Estates; Alesana; In Fear and Faith; Eyes Set to Kill|
|6||28||Petra; Stryper; Bloodgood; Sindizzy; King James; Whitecross|
|5||24||Skyclad; Sabbat; The Beyond; The More I See; Therapy?|
|5||23||Blood Stain Child; Sadie; Girugämesh; -OZ-; Phantasmagoria|
|1||27||Earth, Wind & Fire|
|4||23||Nickelback; 3 Doors Down; Puddle of Mudd; Operator|
|3||24||Chiodos; Scary Kids Scaring Kids; Underminded|
|3||23||The Dillinger Escape Plan; Stolen Babies; Coheed and Cambria|
|5||21||Soil; Drowning Pool; Staind; AM Conspiracy; Oceano|
|4||22||Motionless In White; Falling in Reverse; Escape The Fate; Blessthefall|
One key issue arising from the data is that some of these bands are either clearly or arguably not metal. Bear in mind that the data came from Freebase.com which collected much of the data from MusicBrainz. Some users, no doubt, are more inclusive of genres like “heavy metal” than others. I know of no one who would describe the well-known funk outfit Earth, Wind, & Fire as “metal,” though “funk metal” is a relative large subgenre and one MusicBrainz contributor may have used that subgenre to describe the band. While Earth, Wind, & Fire should not be included in a dataset on metal, the fact that the band had 27 members and none of them had ties to any other band should indicate that their exclusion from the giant component is a meaningful one. Similarly, the Exploited has had 28 members over its history with no connections to the bands in the data, and despite sometimes playing in a style described as “crossover thrash,” most regard the band as more punk than metal. Lastly, many of the bands here fall closer to a genre called “post-hardcore” (e.g., Alesana, In Fear and Faith, Eyes Set to Kill, Chiodos, Scary Kids Scaring Kids, Underminded), where the musical tradition falls closer to hardcore or punk than metal.
With respect to relationships between individuals and organizations, most errors are of omission rather than inclusion. Though descriptions of sound can be riddled with errors, rarely would an informant say that one musician played for a band that she or he never did, but frequently informants will fail to report less memorable collaborations. While this issue diminishes through the collective recollection process of crowdsourced data, the format is not immune. Take for instance the Dillinger Escape Plan and Tanner Wayne, as plotted below.
We see that the Dillinger Escape Plan has had fourteen musicians associated with it, yet only two ties reported to other metal acts in the data. The Dillinger Escape Plan certainly sounds metal, they’ve been active since 1997, they’re known within the metal scene, and they’ve had many lineup changes, so why aren’t they included within the giant component? The Freebase data had omitted a tie between the Dillinger Escape Plan and singer Mike Patton, who provided vocals for their EP Irony is a Dead Scene. As mentioned last week, Patton has been in Faith No More, Fantômas, and Tomahawk, and is part of the giant component of metal. Also, it seems rather strange that one drummer, Tanner Wayne, could single-handedly bridge three bands yet not play for any band in the giant component. Indeed, in addition to Chiodos, Scary Kids Scaring Kids, and Underminded, Wayne has also filled in for Christian metalcore outfit Underoath, a band tied within the giant component and associated with Norma Jean, This Runs Through, Season’s End, and Maylene and the Sons of Disaster.
The presence of components such as these could serve for diagnostic purposes. First, if musicians belong to 1.307 bands on average, the odds of a band having a musician who has played in other bands increases quickly as the number of members increases. Bands with many members who do not play for other bands in the network likely should not exist in the universe and should be removed from the data. Second, small components with multiple bands may be removed from the large component due to omitted relationships. For the case at hand, the obvious solution would be to supplement the Freebase data with additional sources like Allmusic or DBpedia to fill in errors of omission. Smaller components would still exist after adding additional edges where appropriate, however it is important to consider the component’s appropriateness for inclusion in the dataset as well as its ability to form ties to other actors.
Exclusivity. In addition to basic tie probability and data-related issues, smaller components may reflect either highly specific tie formation practices or associational barriers. Consider the following three components:
Each of these components reflect particular regional, stylistic, or ideological restraints. Blood Stain Child, Sadie, Girugämesh, -OZ-, and Phantasmagoria are all Japanese metal acts. Unlike the US, UK, and the Scandinavian countries, Japan hasn’t had a particularly long history of metal. All five of these bands formed after 2000 and began touring outside of Japan in the late-aughts and early 2010s. Though their connections seem appropriate, we should exercise some caution at interpreting their common members, as these musicians go only by their first names in the data. In addition to being Japanese, these bands also follow the style of visual kei, where bands adopt a look “characterized by the use of varying levels of make-up, elaborate hair styles and flamboyant costumes, often, but not always, coupled with androgynous aesthetics.” Sadly, this aesthetic may have proven to be a barrier for collaboration outside of Japan.
The second component in the above graph plots groups who have played metalcore, amalgamating metal and hardcore punk. More specifically, five of the eight bands are from Orange County, California and all include at least one self-identified straight edge member and among five cases the band identifies as straight edge. Individuals and bands who identify as straight edge espouse an ideology in opposition to alcohol, drugs, and (frequently) casual sex, occasionally aligning with vegetarianism and vegan lifestyles. Historically, this ideology came from the hardcore music scene and not from metal origins. In addition to common interests in hardcore and geographical considerations, ideological interests such as support for a straight edge philosophy could further prevent collaborations.
Lastly, like straight edge ideologies, some bands form to express particular religious viewpoints including paganism, Krishnaism, Satanism, and, of course, Christianity. These ideologies often correspond to particular subgenres and instrumentation: paganism to folk metal, Krishnaism to hardcore and metalcore, and Satanism to black metal. Christian metal, on the other hand, is amorphous in terms of its instrumentation, appropriating subgenres into nomenclatures such as “Christian death metal” and “unblack metal.” Additionally, given that a significant portion of metal acts use lyrics and imagery that run counter to Christian interests, metal bands that identify as Christian potentially face challenges to their legitimacy from both the metal community and their religious communities. Should a musician wish to form a Christian metal group, the field of possible collaborators may be limited due to a shortage of interest and lackluster support from the wider metal scene, issues further compounded by any religious vetting process that may be used in the band’s formation.
In the rightmost component we see one such network of collaborations among Christian bands. Stryper, a successful glam metal group with a name from Isaiah 53:5, made their beliefs quite explicit.
This approach differs from Christian bands in the giant component like Underoath and Norma Jean, whose fans could attend one of their shows without being aware of the religious overtones. And despite Stryper’s unambiguous religious expressions, including subverting Satanic imagery used by metal fans and groups, they faced some opposition from other members of their own faith who found the band’s association with metal to be improper. Following Stryper’s breakup, their members joined other Christian metal groups including Whitecross, Bloodgood, Sindizzy, and King James.
Though many network measurements presuppose a connected component, qualitatively exploring why some components are disconnected from the giant component can prove illustrative. These components may be disconnected for very good reasons. In addition to basic matters of general tie probability, such exercises can diagnose data shortcomings and highlight possible constraints in the tie formation process. Though assessing the appropriateness of some cases or identifying missing data in large social networks can prove to be a daunting task, focusing on actors with many connections in large disconnected components can narrow down the candidates who would likely have ties to the giant component. Also, actor attributes–such as geographical region, fashion, and ideology–might be able to illuminate restrictions in tie formation, demonstrating potential boundaries that prevent incorporation.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.