Love is all around: Popular words in pop hits

May 25, 2017

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Data scientist Giora Simchoni recently published a fantastic analysis of the history of pop songs on the Billboard Hot 100 using the R language. Giora used the rvest package in R to scrape data from the Ultimate Music Database site for the 350,000 chart entries (and 35,000 unique songs) since 1940, and used those data to create and visualize several measures of song popularity over time.

A novel measure that Giora calculates is "area under the song curve": the sum of all the ranks above 100 for every week the song is in the Hot 100. By that measure, the most popular (and also longest-charting) song of all time is Radioactive by Imagine Dragons:

Imagine Dragons

It's turns out that calculating this "song integral" is pretty simple in R when you use the tidyverse:

calculateSongIntegral <- function(positions) {
  sum(100 - positions)

billboard %>%
  filter(EntryDate >= date_decimal(1960)) %>%
  group_by(Artist, Title) %>%
  summarise(positions = list(ThisWeekPosition)) %>%
  mutate(integral = map_dbl(positions, calculateSongIntegral)) %>%
  group_by(Artist, Title) %>%
  tally(integral) %>%

Another fascinating chart included in Giora's post is this analysis of the most frequent words to appear in song titles, by decade. He used the tidytext package to extract individual words from song titles and then rank them by frequency of use:


So it seems as though Love Is All Around (#41, October 1994) after all! For more analysis of the Billboard Hot 100 data, including Top-10 rankings for various measures of song popularity and the associated R code, check out Giora's post linked below.

Sex, Drugs and Data: Billboard Bananas

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)