Lyric Analytics

December 28, 2013
By

(This article was first published on More or Less Numbers, and kindly contributed to R-bloggers)

I was messing around with the text mining (tm) package in R and was thinking of something I could comb through.  I looked through some other blogs and websites to see how they were using it:  mining through presidential speeches debates being one of the more notable uses.  I was thinking about it in terms of where we focus mostly on how things are said and really not what is actually said...ie music.  For some people, what is said is important and I would argue for most artists who, "make it" eventually it comes around to be important to an artist's being catapulted into fame :-)



One of the greater(est) rock and roll icons who began making albums slightly before my ears were ready for it is "The Boss".  Lots of albums, lots of words.  Although Bruce Springsteen was definitely communicating a lot of powerful ideas in his music, ultimately it's the passion he sings with, the way he is miraculously able to have a sax weaved into most of his songs, or just having come up with the nickname "The Boss" that makes him awesome.  Anyways, I thought about looking at the album Born to Run as a primer in what I'll call, "lyric mining".

Below is a graph showing the most frequent words used in the album across every song that are mentioned at least 5 times.  For those of us familiar with the album, if you just look at these words you can hear the music.

Number of times words are used in the album Born to Run

Below is a heat map showing words and their corresponding albums (lighter the colors = more usage).  I chose only words that were mentioned at least 5 times in the album, otherwise it was too large.  On the axis not labeled are dendograms (basically a graphical way to show how things are associated or how similar they are - more on how to read them here).  Each cell in the heat map has a bar graph showing the relative usage of the word...highest being 10 times in one song...that being the word "one" in the song "She's the One".

Born to Run album Heatmap

In terms of what is said you'll notice that the song "Night" is associated with "Jungleland"(height of the lines and being in the same "clade" on the dendogram) in terms of the words used, at least those that are used at least 5 times.  Here's how this looks when they are graphed against each other:

Night lyrics graphed against Jungleland lyrics for words mentioned 5=< times
Alternatively "10 Avenue Freeze Out" is an outlier in terms of word usage among the other songs as you can see it sits relatively unconnected from other songs in the dendogram on the x-axis.

On the y-axis you can see the associations across different words and their usage.  "Night" and "One" even though they are used a lot are not distributed the same (different "clades")...meaning when "The Boss" is belting it out, he's using these words in different places - different songs.

Which brings up an interesting point about great albums (in my opinion):  their distribution of themes.  "Born to Run" definitely has some great themes in it and while I won't interpret the meaning of each song, we can see it through the distribution of words in the lyrics and songs in the album.  While we know the song themes are strong in this album, we can also see (through the word distribution) that the great themes are distributed across the album and it's not just one song that makes this album great.



To leave a comment for the author, please follow the link and comment on his blog: More or Less Numbers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.