Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Giddy up, giddy it up
Wanna move into a fool’s gold room
With my pulse on the animal jewels
Of the rules that you choose to use to get loose
With the luminous moves
Bored of these limits, let me get, let me get it like
Wow!

When it comes to surreal lyrics and videos, I’m always thinking of Beck. Above, I cited the beginning of the song “Wow” from his latest album “Colors” which has received rather mixed reviews. In this post, I want to show you what I have done with Spotify’s API. We will also see how “Colors” compares to all the other albums by Beck on a range of the API’s measures. This is what we gonna do:

• Set up the R packages to access the Spotify and Last.FM APIs
• Get chart information on a specific user (in this case: me) from Last.FM
• Get information on these top artists from Spotify
• Visualize some of the information with ggplot2
• Again: Visualize this information
First, let’s set things up. If you want to download the RLastFM package (it’s not on CRAN), you can follow these instruction. If you need an account for the Spotify API, you can get one on this page. Get an account for the Last.FM API with this form.

library(spotifyr)
library(RLastFM)
library(dplyr)

# Spotify
s.id <- "< API ID >“
s.sc <- "< API Secret"

# LastFM
l.ky <- "< API Key >“
l.ap <- "http://ws.audioscrobbler.com/2.0/"

Sys.setenv(SPOTIFY_CLIENT_ID = s.id)
Sys.setenv(SPOTIFY_CLIENT_SECRET = s.sc)

access_token <- get_spotify_access_token()

I slightly adapted the user.getTopArtists() function in the RLastFM package because I wanted to use the “limit” parameter of the Last.FM API.

user.getTopArt2 <- function (username, period = NA, key = l.ky, parse = TRUE, limit = 20) {
params = list(method = “user.gettopartists”, user = username,
period = period, api_key = key, limit = limit)
params = params[!as.logical(lapply(params, is.na))]
ret = getForm(l.ap, .params = params)
doc = xmlParse(ret, asText = TRUE)
if (parse)
doc = RLastFM:::p.user.gettopartists(doc)
return(doc) }

Also, I adapted the function get_artist_audio_features() from the spotifyr package to override the dialog where I am required to type the name of the artist in interactive sessions.

get.art.audio.feats <- function (artist_name, access_token = get_spotify_access_token()) {
artists <- get_artists(artist_name)
selected_artist <- artists$artist_name[1] artist_uri <- artists$artist_uri[artists$artist_name == selected_artist] albums <- get_albums(artist_uri) if (nrow(albums) > 0) { albums <- select(albums, -c(base_album_name, base_album, num_albums, num_base_albums, album_rank)) } else { stop(paste0(“Cannot find any albums for \””, selected_artist, “\” on Spotify”)) } album_popularity <- get_album_popularity(albums) tracks <- get_album_tracks(albums) track_features <- get_track_audio_features(tracks) track_popularity <- get_track_popularity(tracks) tots <- albums %>% left_join(album_popularity, by = “album_uri”) %>% left_join(tracks, by = “album_name”) %>% left_join(track_features, by = “track_uri”) %>% left_join(track_popularity, by = “track_uri”) return(tots) } Now, we getting the relevant data. First, we are accessing the current top 20 artists (of all time) of a specific Last.FM user. top.arts <- as.data.frame(user.getTopArt2("< user name >“, period = “overall”, limit = 20)) head(top.arts) artist playcount mbid rank 1 Menomena 1601 ad386705-fb8c-40ec-94d7-e690e079e979 1 2 Beck 1594 a8baaa41-50f1-4f63-979e-717c14979dfb 2 3 Radiohead 1556 a74b1b7f-71a5-4011-9441-d0b5e4122711 3 4 PJ Harvey 1546 e795e03d-b5d5-4a5f-834d-162cfb308a2c 4 5 Prince 1501 cdc0fff7-54cf-4052-a283-319b648670fd 5 6 Nick Cave & The Bad Seeds 1243 172e1f1a-504d-4488-b053-6344ba63e6d0 6 Now, we are accessing the “artist audio features” for each one of the top 20 artists. I’m including a Sys.sleep(5) here, because I don’t want to access Spotify’s API too frequently. With 5 seconds between each artist, we should be on the safe side – I guess 2 seconds would be enough. Please note that get.art.audio.feats() returns a tibble that contains one row for each of the artist’s tracks. That is why I have to aggregate the data by artist. Hence, the “audio features” of an artist are the mean values of all the tracks by this artist. art.info.list <- list() for (i in 1:nrow(top.arts)) { cat(i, “\n”) new.art.info <- get.art.audio.feats(top.arts[i, "artist"]) new.art.info$artist <- top.arts[i, "artist"]
art.info.list[[length(art.info.list)+1]] <- new.art.info
Sys.sleep(5)
}
art.info <- do.call("rbind", art.info.list)

agg.art.info <- aggregate(cbind(danceability, energy, speechiness,
acousticness, instrumentalness, liveness,
valence, tempo, track_popularity) ~ artist,
data = art.info, FUN = mean)

Now, let’s merge Last.FM’s playcount information with the information we got from Spotify’s API. We are also sorting the dataframe by my playcount on Last.FM.

agg.art.info <- merge(agg.art.info, top.arts[,c("playcount", "rank", "artist")], by = "artist", all.x = T, all.y = F)
agg.art.info <- agg.art.info[order(agg.art.info$playcount, decreasing = T),] saveRDS(agg.art.info, file = “agg.art.info.Rds”) What we get is this: artist danceability energy speechiness acousticness instrumentalness liveness 7 Menomena 0.5397246 0.5475217 0.05414783 0.2678826 0.28483706 0.1451246 3 Beck 0.6043865 0.6106874 0.06503006 0.2653964 0.20943021 0.1872540 13 Radiohead 0.4095833 0.5755077 0.05504551 0.3131903 0.39465664 0.2026224 11 PJ Harvey 0.5155636 0.4578315 0.09909212 0.4328224 0.13406858 0.1739836 12 Prince 0.7078350 0.7295825 0.04883786 0.2464166 0.04265596 0.1405495 8 Nick Cave & The Bad Seeds 0.4147489 0.5152961 0.05414545 0.3465942 0.07602337 0.2952866 valence tempo track_popularity playcount rank 7 0.3084768 131.4496 5.84058 1601 1 3 0.4795632 115.6165 21.64417 1594 2 13 0.2858109 118.7648 47.62821 1556 3 11 0.4331848 118.7181 21.32727 1546 4 12 0.7672718 126.3280 25.81553 1501 5 8 0.3521593 119.5002 27.47619 1243 6 Just a few quick notes concerning the audio features given by the Spotify API (I’m citing Spotify’s webpage here): • danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. • energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. • speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. • acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. • instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. • liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). • tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. • track_popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. I am only using a few of these features for visualizations here, namely danceability, valence, and track_popularity. Let’s start with a scatter plot. ggplot2 and the really great package “ggrepel” already do a good job here. library(ggplot2) library(ggrepel) library(tidyr) library(scales) library(RColorBrewer) data <- readRDS("agg.art.info.Rds") ggplot(data, aes(x = danceability, y = playcount, size = track_popularity, col = valence)) + geom_point() + geom_label_repel(aes(label = artist)) + labs(x = “Danceability”, y = “Playcount”, size = “Mean popularity”, col = “Valence”, title = “Popularity, Valence, and Danceability”, subtitle = “TOP 20 Last.FM artists – with track information from Spotify”) + scale_color_continuous(low = “darkred”, high = “green”) + scale_size(range = c(2, 6)) + theme_minimal() (please click to enlarge) It’s quite a lot information in this plot. Prince – along with the not very popular Nikka Costa – is definitely the most danceable and most positive (in terms of emotional valence) in my top artists. There is no clear relationship between danceability and my playcount. Also, “Menomena”, my top artist, is really not popular on Spotify – I guess you can call that a hipster factor. Now, I want to have a semantic profile plot for my top 5 artists. It’s a little bit of coding work before we get this. for (var in c(“tempo”, “track_popularity”, “playcount”)) { data[, paste0(“norm.”, var)] <- rescale(data[, var], to = c(0, 1)) } data.long <- gather(data, key = variable, value = value, c("danceability", "energy", “speechiness”, “acousticness”, “instrumentalness”, “liveness”, “valence”)) substr(data.long$variable, 1, 1) <- toupper(substr(data.long$variable, 1, 1)) data.long$artist <- factor(data.long$artist, levels = data$artist)

ggplot(data.long[data.long$rank %in% 1:5,], aes(y = value, x = variable, group = artist, col = artist)) + geom_line(size = 2, lineend = “round”, alpha = .8) + scale_color_manual(values = brewer.pal(5, “Spectral”)) + scale_y_continuous(breaks = NULL) + labs(x = “”, y = “”, col = “Artist”) + coord_flip() + theme_dark() + theme(axis.text.y = element_text(size = 15), panel.grid.major.y = element_line(size = 1)) (please click to enlarge) It’s still not optimal, but we get a good clue of the different profiles of the artists. We can read this plot horizontally to get a ranking of artists for one feature. But we can also read it vertically to get a clue of the overall distribution of features. Now, let’s move on to a more detailed analysis of Beck and Radiohead. First, I am analyzing all the albums by Beck with their year of release on the x-axis, their popularity on the y-axis, danceability color-coded and valence coded by size. beck <- as.data.frame(get.art.audio.feats("Beck")) beck.alb <- aggregate(cbind(danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, tempo, track_popularity) ~ album_name, data = beck, FUN = mean) beck.alb <- merge(beck.alb, unique(beck[, c("album_name", "album_release_year", "album_popularity")]), by = "album_name", all.x = T, all.y = F) beck.alb$album_release_year <- substr(beck.alb$album_release_year, 1, 4) beck.alb$n.tracks <- sapply(beck.alb$album_name, USE.NAMES = F, FUN = function (x) table(beck$album_name)[x])
beck.alb[beck.alb$album_name == “One Foot In The Grave (Expanded Edition)”, “album_name”] <- "One Foot In The Grave" beck.alb <- beck.alb[order(beck.alb$album_release_year),]

ggplot(beck.alb, aes(x = album_release_year, y = album_popularity,
label = album_name, size = valence,
color = danceability)) +
geom_point() +
geom_label_repel() +
scale_color_continuous(low = “darkred”, high = “green”) +
scale_size(range = c(3, 10)) +
labs(x = “Release year”, size = “Valence”, y = “Album popularity”, color = “Danceability”,
title = “Albums by Beck”, subtitle = “Values taken from Spotify API”) +
theme_minimal()

Another thing I wanted to show you is some kind of “album profile” or – more precisely – an answer to the question whether there are tracks on albums that are very salient in terms of track popularity. We can easily do this with a bit of transformation and facet wrapping. I am coding track popularity with both the height of the bar and the color of the bar to get an impression of the popularities of the tracks compared to the overall dataset (because the axes are allowed to very freely for the facets).

beck %>% group_by(album_name) %>%
mutate(track.no = 1:length(track_name)) %>%
ungroup() %>% as.data.frame() -> beck

ggplot(beck, aes(x = factor(track.no), y = track_popularity, fill = track_popularity)) +
geom_bar(stat = “identity”) +
labs(x = “Track”, y = “Track popularity”, fill = “Track popularity”,
title = “Popularity of tracks on albums by Beck”,
subtitle = “Values taken from Spotify API”) +
facet_wrap(~ album_name, scales = c(“free”))

See the rather high first bar for “Mellow Gold”? That’s “Loser”, still the most popular of Beck’s tracks with the rest of the tracks on “Mellow Gold” rather in “Guero” territory in terms of popularity. Also, “Colors” – Beck’s latest album – is rather balanced in terms of popularity.

And the same for Radiohead – only that I am excluding two albums (see in the code below) and am using the tempo feature for the coloring.

radiohead <- radiohead[!(radiohead$album_name %in% c("OK Computer OKNOTOK 1997 2017", "TKOL RMX 1234567")),] radiohead.alb <- aggregate(cbind(danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, tempo, track_popularity) ~ album_name, data = radiohead, FUN = mean) radiohead.alb <- merge(radiohead.alb, unique(radiohead[, c("album_name", "album_release_year", "album_popularity")]), by = "album_name", all.x = T, all.y = F) radiohead.alb$album_release_year <- substr(radiohead.alb$album_release_year, 1, 4) radiohead.alb$n.tracks <- sapply(radiohead.alb$album_name, USE.NAMES = F, FUN = function (x) table(radiohead$album_name)[x])

label = album_name, size = valence,
color = tempo)) +
geom_point() +
geom_label_repel() +
scale_color_continuous(low = “darkred”, high = “green”) +
scale_size(range = c(3, 10)) +
labs(x = “Release year”, size = “Valence”, y = “Album popularity”, color = “Tempo”,
title = “Albums by Radiohead”, subtitle = “Values taken from Spotify API”) +
theme_minimal()

It’s a shame how Disk 2 of “In Rainbows” compares to Disk 1 because it contains really great songs. Also, it’s interesting that “OK Computer” still not reaches the popularity of “Pablo Honey” – as the second plot suggests, this is because of the popularity of “Creep”. For the second plot, I am not double-coding track popularity with color but am using the valence feature for the color.

mutate(track.no = 1:length(track_name)) %>%

geom_bar(stat = “identity”) +
labs(x = “Track”, y = “Track popularity”, fill = “Valence”,
subtitle = “Values taken from Spotify API”) +
facet_wrap(~ album_name, scales = c(“free”))

Here we see that this valence feature certainly has some problems. “Fitter Happier” is one of the most positive songs by Radiohead with a value of 0.74??? I beg to differ…