Today's guest post comes from Revolution Analytics data scientist Luba Gloukhov — ed.
As a big fan of R’s ever-expanding geospatial plotting
capabilities, I jumped at the chance create a map using The Million Song
Dataset (MSD). For each of the 10,000
songs in the million song subset containing non-missing latitudes &
longitudes and non-zero artists familiarity scores, I added a song ‘break-out’
score — the ratio of song hotttnesss to artist familiarity (similar that introduced
by Echo Nest’s Hottt
or Nottt blog post). I then subset
the data further to only include songs with a break-out score of greater than 1.
Exploring the map, I found myself YouTubing songs. Many of them I had never heard of before — a sign of
either generally low artist familiarity scores or my own tanking hipppnesss. I decided against exploring the “why?” fearing
it might just prove the latter. In an
effort to simplify my process of exploration (and “what’s hottt”-self-education),
I used R to embed YouTube videos in the map’s marker info windows.
Explore the interactive map here.
For each song in my subset, I added a new variable containing
code that embeds the first YouTube video search result of the artist name and
song title. For
plotting, I used Milan Kilibarda’s plotGoogleMaps package. You can access my code and data via github.
Oakland’s own Del tha Funkee Homosapien’s 1991 hit
Mistadobalina came up as a recent break-out song (with a ratio of 1.04). Since 1991, Del’s had numerous successes – as
a member of Hieroglyphics, with the album Deltron 3030 and, perhaps most
prominently, as a member of the Gorrilaz. One would think that, by now, Del’s
familiarity would surpass Mistadobalina’s hotttnesss, generating a ratio of
less than 1. How does this ratio compare
to those of other Del songs? Is it really the case that
Mistadobalina remains Del’s biggest break-out hit or do the hotttness scores of
Del’s songs in general surpass his familiarity? Perhaps more enlightening than momentary snapshots
of these metrics would be an investigation of how the variables have changed over
So much data exploration, so little time! I’ll have to save
these topics for another R play date.
Luba Gloukhov is a Pre-Sales Engineer at Revolution Analytics. When not playing with R or helping others do the same, she can be found lifting heavy objects, thinking light thoughts or eating delicious food.