Hottest 100 for 2011

[This article was first published on Stubborn Mule » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another year, another Australia Day. Another Australia Day, another Triple J Hottest 100. And that, of course, means an excellent excuse to  set R to work on the chart data.

For those outside Australia, the Hottest 100 is a chart of the most popular songs of the previous year, as voted by the listeners of the radio station Triple J. The tradition began in 1991, but initially people voted for their favourite song of all time. From 1993 onwards, the poll took its current form* and was restricted to tracks released in the year in question.

Since the Hottest 100 Wikipedia pages include country of origin**, I thought I would see whether there is any pattern in whose music Australians like best. Since it is Australia Day, it is only appropriate that we are partial to Australian artists and they typically make up close to half of the 100 entries. Interestingly, in the early 90s, Australian artists did not do so well. The United Kingdom has put in a good showing over the last two years, pulling ahead of the United States. Beyond the big three, Australia, UK and US, the pickings get slim very quickly, so I have only included Canada and New Zealand in the chart below.

Number of Hottest 100 tracks by Country

If you have excellent eyesight, you may notice that 2010 is missing from the chart. For some reason, this is the only year which does not include the full chart listing on the Wikipedia page. There is a link to a list on the ABC website, but unfortunately it does not include the country of origin. Maybe a keen Wikipedian reading this post will help by updating the page.

I make no great claims for the sophistication or the insight of this analysis: it was really an excuse to learn about using the XML package for R to pull data from tables in web pages.

require(XML)
require(ggplot2)
require(reshape2)

results <- data.frame()
col.names <- c("year", "rank", "title", "artist", "country")

# Skip 2010: full list is missing from Wikipedia page
years <- c(1993:2009, 2011)

for (year in years) {
    base.url <- "http://en.wikipedia.org/wiki/Triple_J_Hottest_100,"
    year.url <- paste(base.url, year, sep="_")
    tables <- readHTMLTable(year.url, stringsAsFactor=FALSE)
    table.len <- sapply(tables, length)
    hot <- cbind(year=year, tables[[which(table.len==4)]])
    names(hot) <- col.names
    results <- rbind(results, hot)
}

# Remap a few countries
results$country[results$country=="Australia [1]"] <- "Australia"
results$country[results$country=="England"] <- "United Kingdom"
results$country[results$country=="Scotland"] <- "United Kingdom"
results$country[results$country=="Wales"] <- "United Kingdom"
results$country[results$country=="England, Wales"] <-"United Kingdom"

# Countries to plot
top5 <- c("Australia", "United States", "United Kingdom",
  "Canada", "New Zealand")

# Create a colourful ggplot chart
plt <- ggplot(subset(results, country %in% top5),
    aes(factor(year), fill=factor(country)))
plt <- plt + geom_bar() + facet_grid(country ~ .)
plt <- plt + labs(x="", y="") + opts(legend.position = "none")

UPDATE: there is a little bit more analysis in this follow-up post.

* Since the shift to single year charts, there have been two all-time Hottest 100s: 1998 and 2009.

** There are some country combinations, such as “Australia/England”, but the numbers are so small I have simply excluded them from the analysis.

To leave a comment for the author, please follow the link and comment on their blog: Stubborn Mule » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)