**Revolutions**, and kindly contributed to R-bloggers)

*Today's guest post comes from Revolution Analytics data scientist Luba Gloukhov — ed.*

As a big fan of R’s ever-expanding geospatial plotting

capabilities, I jumped at the chance create a map using The Million Song

Dataset (MSD). For each of the 10,000

songs in the million song subset containing non-missing latitudes &

longitudes and non-zero artists familiarity scores, I added a song ‘break-out’

score *—* the ratio of song hotttnesss to artist familiarity (similar that introduced

by Echo Nest’s Hottt

or Nottt blog post). I then subset

the data further to only include songs with a break-out score of greater than 1.

Exploring the map, I found myself YouTubing songs. Many of them I had never heard of before *—* a sign of

either generally low artist familiarity scores or my own tanking hipppnesss. I decided against exploring the “why?” fearing

it might just prove the latter. In an

effort to simplify my process of exploration (and “what’s hottt”-self-education),

I used R to embed YouTube videos in the map’s marker info windows.

Explore the interactive map here.

For each song in my subset, I added a new variable containing

code that embeds the first YouTube video search result of the artist name and

song title. For

plotting, I used Milan Kilibarda’s plotGoogleMaps package. You can access my code and data via github.

Oakland’s own Del tha Funkee Homosapien’s 1991 hit

Mistadobalina came up as a recent break-out song (with a ratio of 1.04). Since 1991, Del’s had numerous successes – as

a member of Hieroglyphics, with the album Deltron 3030 and, perhaps most

prominently, as a member of the Gorrilaz. One would think that, by now, Del’s

familiarity would surpass Mistadobalina’s hotttnesss, generating a ratio of

less than 1. How does this ratio compare

to those of other Del songs? Is it really the case that

Mistadobalina remains Del’s biggest break-out hit or do the hotttness scores of

Del’s songs in general surpass his familiarity? Perhaps more enlightening than momentary snapshots

of these metrics would be an investigation of how the variables have changed over

time.

So much data exploration, so little time! I’ll have to save

these topics for another R play date.

*Luba Gloukhov is a Pre-Sales Engineer at Revolution Analytics. When not playing with R or helping others do the same, she can be found lifting heavy objects, thinking light thoughts or eating delicious food.*

**leave a comment**for the author, please follow the link and comment on their blog:

**Revolutions**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...