The Ramones. Punk is Data, Too

[This article was first published on English – R-blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

 

The starting point of this post is a simple question: can we use R to analyze punk bands ? And, as a result: what can we learn from applying data analytics methods to punk music ?

 

Whether we like it or not “punk rock is arguably the most important subgenre of music to come out of the ‘70s” and consequently still an integral part of our mainstream contemporary culture. After years of being declared too outrageous to be accepted, its legacy is so astonishingly extensive that it deserves careful consideration and serious attention. Since decades, many music critiques, fine arts experts, social and political scientists or historians of pop culture have devoted time and energy to study the punk scene, its cultural production and legacy, the attitude of the punk generation, its tangle of ideologies, the ways it was perceived and received. Facts and figures, however, are still missing, perhaps because there apparently is nothing more distant from data analytics than punk music. So, is data analytics of punk rock possible ? Would it make any sense ? My answer is a loud and bold yes –yes, statistics on punk rock matters.

 

Although the punk scene cannot be condensed into a single band, the Ramones are still considered by many as the first “pure punk band” and, perhaps –and more importantly–, one of the most influential. This does not imply that other punk rock bands (Clash, Dead Kennedys, The Stooges, Misfits, Sex Pistols, Social Distorsion, Patti Smith Group, etc) are less noteworthy or not as good. Yet, since I need to start somewhere, I decided that my first attempt would focus on the Ramones –which I paradoxically like a lot despite being more of a baroque and classical music person.

 

What did the Ramones do ?

From 1976 to 1994, the Ramones released 14 studio albums. In their original USA release, the albums comprised 176 different songs in total that were quite short (median: 2M 32S) and mostly written in a Major key (only 2 songs are in a minor key: Em).

YearAlbumNbre of SongsLength
1976Ramones1428M 52S
1977Leave Home1428M 57S
1977Rocket To Russia1428M 5S
1978Road To Ruin1228M 9S
1980End Of The Century1228M 50S
1981Pleasant Dreams1228M 53S
1983Subterranean Jungle1228M 21S
1985Too Tough To Die1228M 18S
1986Animal Boy1228M 44S
1987Halfway To Sanity1228M 53S
1989Brain Drain1228M 2S
1992Mondo Bizarro1328M 25S
1993Acid Eaters1228M 3S
1994Adios Amigos1328M 1S

Musical purists always reproached the Ramones for knowing a couple of chords only and making an excessive use of them. Data show that the band knew at least… 11 different chords (out of too-many-to-bother-counting possibilities) although 80% of their songs were built on no more than 6. And there is no evidence of a sophistication of the Ramones’ compositions over time.

Just as the number of different chords in a Ramones’ song is independent from the song writer/s –t.test of number of different chords ~ writers don’t allow to exclude alternative hypothesis–, even with each band member having a very distinct personality, according to the biographers.

 

In terms of official charts ranking in the USA, the success of the Ramones fluctuated over their career. The first years of the band were definitely the most successful, from the creation of the band till the early 80’s. Then, from 1985 onwards, it looks like that the sales didn’t follow the strengthening of their reputation not only within but also outside the punk rock scene.

 

What did the Ramones say ?

Im my dataset, the Ramones’ lyrics come from azlyrics.com. I preferred this source over many other available sources since that website provides the lyrics without the verses repeats, which, in my opinion, would over-emphasise and, ultimately, biais the relevance of n-grams or topics. The dataset (a data frame) contains a lyrics variable, i.e. a character string of the track (without the verses repeats) including the < br> tags to mark the end of each line.

An example of the lyrics variable is like the following:

Hey ho, let s go < br>Hey ho, let s go < br>They re forming in a straight line < br>They re going through a tight wind < br>The kids are losing their minds < br>The Blitzkrieg Bop < br>They re piling in the back seat < br>They re generating steam heat < br>Pulsating to the back beat < br>The Blitzkrieg Bop. < br>Hey ho, let s go < br>Shoot em in the back now < br>What they want, I dont know < br>They re all reved up and ready to go

Tidying the text up (adopting the data principles recommended by Hadley Wickham) is the necessary first step of the lyrics mining exercise. For that, I follow the tidy text approach developed by Julia Silge & David Robinson.

 

First and foremost, it is worth noting that whatever the Ramones say, they say it in very few words ! Ramones songs are brief in time, but also short in lyrics (but not so much in vocabulary with 2,139 different unique words in total).

Whereas uniGrams are usually considered suitable for analysis after expurgation of stop words, in the Ramones lyrics the raw uniGrams show an interesting pattern. The 2 most frequent words in the 14 studio albums are i and you. One could provocatively argue that Tea for Two, a well-known 1925 song from Vincent Youmans and Irving Caesar, is a good representation of the Ramones musical universe that seems to be mainly centered on you and i, and i and you !

In the uniGrams table below, the columns of the cleaned uniGrams highlight that the top word in the Ramones lyrics is dont, expressing an atmosphere of clear negation. But there is also a fascinating tension pointing to the future that shows through words such as wanna, gonna and ll (will or shall). Rock and punk amongst the top 20 words definitely remind you what type of music you are listening to but also what subculture the band belongs to. In an all-men band, words such as baby, love, girl witness the significance of man-woman relationships in the Ramones songs. Perhaps it took statistical analysis of lyrics to take the risk of forming the hypothesis of the Ramones as a romantic band…

All uniGramsFreq|Cleaned uniGramsFreq
i1510|dont317
you800|baby241
the773|yeah161
a615|love154
to584|wanna122
s498|gonna117
and438|time90
it402|ll78
my372|life61
me322|rock58
dont317|day57
oh259|girl55
in258|hey55
of251|remember54
baby241|punk52
t237|ve52
m232|world48
no215|fun43
can202|feel42
on200|bad41

 

The identification of most frequent uniGrams per album is a further step into a more granular analysis:

 

In addition to identifying the most frequent single words, we could also highlight when they are used in the discography using a simple Token Distribution Analysis. Let’s limit this exercise to 5 words only from the list of the top 20: love, gonna, rock (or rocker), life and dont.

A quick visualisation of ‘raw’ nGrams (stop words not removed) confirms the feeling of a narrative universe mainly focused on i, you and negation (don’t).

 

What did the Ramones feel ?

As a (brief) final chapter of this post, I would like to run a very quick –and limited– sentiment analysis of the Ramones’ studio albums lyrics. Actually, rather than a sentiment analysis, this is nothing but scratching the surface of sentiment analysis. The bing sentiment lexicon was used here, but a similar analysis could be carried out using afinn or nrc lexicons (all available in the tidytext r package) or using all of them for a comparative approach.

Although the sentiment lexicon gives the word punk a negative value, there is little risk in asserting that this is not the way the Ramones intended it.

 

In order to both fine tune and expand the approach, a more accurate sentiment analysis could be undertaken paying attention to 5 additional tasks at least:

  • in the lyrics, identify the sentiment words preceded or followed by not;
  • review and, perhaps, amend the sentiment lexicon(s) to better reflect the punk rock subculture;
  • focus on relative more than absolute frequencies of words;
  • add terms’ inverse document frequency analysis to measure the impact of the words that are rarely used;
  • use ML to spot/predict combinations of n-Grams, topics, writers that would “guarantee” a better ranking in the charts.

 


The dataset and complete R code of this post can be downloaded from this link.


To leave a comment for the author, please follow the link and comment on their blog: English – R-blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)