Chord progressions of 5 000 songs!

March 1, 2015

(This article was first published on R –, and kindly contributed to R-bloggers)

The database contains analyses of over 5000 songs*. These analyses are uploaded by users and allow for all these songs to be analyzed in bulk, as well as individually. One of these ‘all song’ analyses enables users to gather chord progressions on ALL songs (see the analysis file to see how i did it, using the hooktheory API and R). This allowed us to  create a Sankey visualization of all chord progressions in the Hooktheory database.

Check it out!


(If you prefer the dynamic version where you can play with the data, have a look at the following link: Click here!).

Explaining the figure a little bit: What interests us here is the type of chords used, regardless of the song’s scale, so that 1->5->6 in the figure above includes songs in key of C major that have the chord progression C->G->Am and songs in the key of A major that have A->E->F#m (if the songs have the same Roman numerals and are in the same relative major.  In reality, the API blends songs into rough categories regardless of the song’s mode, so it’s impossible to know for sure what we’re dealing with).

The chord progressions start from the left, and continue to the right. So for example, the transition 4->1->5->6 is one of the most popular ones… and is in fact present in 327 songs! Check em out!


In the API, chord probabilities are stated as a percent, such that the relative importance of each chord is known at each step (the normalization technique is not known). In their API, there were 29 chords available at the start of all progressions. For every subsequent transition, the number of chord options increases (which is expected), but for the purpose of this visualization, I only keep the original 29 chords for every transition for graphical purposes (I expect these 29 to be the most common anyway, so it’s not that much of a big deal). Also, since the thickness of the lines I’m plotting are in and of themselves probabilities, and the probability that you are on that chord is different, the “total thickness of each transition” isn’t the same. Very lazily, I just normalized all probabilities across each transition so that each transition “mega bar” is kind-of the same height. I’m sure there’s a better way to do it, the community is invited to improve!

My analysis is here, collaboration and/or remixing with attribution is welcome! (and if you improve the normalization method, please let me know and I’ll update this post).


  • There are several limitations to this assessment since the Hooktheory  API wasn’t really intended for this type of analysis. For example, it doesn’t mention whether “6” is “vi” (minor) or “VI” (major), which is kind of a big deal.
  • As mentioned, I selected only 29 chords to track… I might be missing a lot of progressions.
  • I have no idea if the normalization I applied is valid. I stopped trying when the output I got was semi-reasonable.
  • Blending everything together like this probably obscures some interesting patterns
  • I only did chord-progressions that were 4 steps long… I could have gone farther, but didn’t want to slam the API too much (as you can imagine, the number of queries increases drastically for each ‘step’. The Start -> First step was 1 query that yielded 29 chords, the 2->3 transition was 29 results for each of the 29 chords from step1 (so 29^2 queries), the 3->4 transition was 29^3 queries and so on) .
  • The songs have been uploaded by users from around the world, but represent mostly Western music. It would be awesome to do this with music from other parts of the world.

Possible Legend (thanks to HertzDevil):

The numbers are as they are represented in the Trends search string, here in EBNF metasyntax:

(* Roman numerals *)
numeral = “1” | “2” | “3” | “4” | “5” | “6” | “7”;
(* Borrowed modes, from Dorian to Locrian *)
mode = “D” | “Y” | “L” | “M” | “b” | “C”;
(* Figured bass for triadic and seventh chords *)
inversion = “6” | “64” | “7” | “65” | “43” | “42”;
(* Functions available for applied chords *)
function = “4” | “5” | “7”;
(* Basic chords or borrowed chords in the relative Major key *)
simple-chord = [mode], numeral, [inversion];
(* Applied chords *)
applied-chord = function, [inversion], “/”, numeral;
(* Chord progressions for both the Trends page and the API *)
chord = simple-chord | applied-chord;
trends-progression = chord, {“.”, chord};
api-progression = chord, {“,”, chord};

Parting thoughts:

  • Even though there is a great variety of chords and chord progressions, progressions involving 1,4,5, and 6 are favoured, probably because they ‘sound good’ to our brain. Nowhere is this better illustrated than by Axis of Evil’s song “4 Four Chord Song”. I definitely expected chord 1 to be used frequently, but I was expecting more variability.
  • Music is pretty to look at!
  • If you’re a musician, try weird progressions! I know that what sounds good sounds good, but jeez… how will humanity ever learn to be creative if everyone keeps doing the same thing over and over?


(thanks to Laure Belotti for editorial prowess)


EDIT: I’ve been getting great feedback on this post. Please check out the great conversations here and here. Giving credit where it’s due, turns out Axis of Evil wasn’t the first to talk about Chord-progression overusage, check out this dude. More credit where it’s due, turns out I wasn’t the first one to come up with this idea (great minds indeed…). And finally, I’m sure you nerds all checked out hooktheory, but take a look at these other resources also!


*EDIT2: Originally I was under the impression that the hooktheory database contained over 25000 songs… but a hooktheory admin clarified that in fact there’s just over 5000.

To leave a comment for the author, please follow the link and comment on their blog: R – offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)