Hottest 100 for 2011
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Another year, another Australia Day. Another Australia Day, another Triple J Hottest 100. And that, of course, means an excellent excuse to set R to work on the chart data.
For those outside Australia, the Hottest 100 is a chart of the most popular songs of the previous year, as voted by the listeners of the radio station Triple J. The tradition began in 1991, but initially people voted for their favourite song of all time. From 1993 onwards, the poll took its current form* and was restricted to tracks released in the year in question.
Since the Hottest 100 Wikipedia pages include country of origin**, I thought I would see whether there is any pattern in whose music Australians like best. Since it is Australia Day, it is only appropriate that we are partial to Australian artists and they typically make up close to half of the 100 entries. Interestingly, in the early 90s, Australian artists did not do so well. The United Kingdom has put in a good showing over the last two years, pulling ahead of the United States. Beyond the big three, Australia, UK and US, the pickings get slim very quickly, so I have only included Canada and New Zealand in the chart below.
Number of Hottest 100 tracks by Country
If you have excellent eyesight, you may notice that 2010 is missing from the chart. For some reason, this is the only year which does not include the full chart listing on the Wikipedia page. There is a link to a list on the ABC website, but unfortunately it does not include the country of origin. Maybe a keen Wikipedian reading this post will help by updating the page.
I make no great claims for the sophistication or the insight of this analysis: it was really an excuse to learn about using the XML package for R to pull data from tables in web pages.
require(XML) require(ggplot2) require(reshape2) results <- data.frame() col.names <- c("year", "rank", "title", "artist", "country") # Skip 2010: full list is missing from Wikipedia page years <- c(1993:2009, 2011) for (year in years) { base.url <- "http://en.wikipedia.org/wiki/Triple_J_Hottest_100," year.url <- paste(base.url, year, sep="_") tables <- readHTMLTable(year.url, stringsAsFactor=FALSE) table.len <- sapply(tables, length) hot <- cbind(year=year, tables[[which(table.len==4)]]) names(hot) <- col.names results <- rbind(results, hot) } # Remap a few countries results$country[results$country=="Australia [1]"] <- "Australia" results$country[results$country=="England"] <- "United Kingdom" results$country[results$country=="Scotland"] <- "United Kingdom" results$country[results$country=="Wales"] <- "United Kingdom" results$country[results$country=="England, Wales"] <-"United Kingdom" # Countries to plot top5 <- c("Australia", "United States", "United Kingdom", "Canada", "New Zealand") # Create a colourful ggplot chart plt <- ggplot(subset(results, country %in% top5), aes(factor(year), fill=factor(country))) plt <- plt + geom_bar() + facet_grid(country ~ .) plt <- plt + labs(x="", y="") + opts(legend.position = "none")
UPDATE: there is a little bit more analysis in this follow-up post.
* Since the shift to single year charts, there have been two all-time Hottest 100s: 1998 and 2009.
** There are some country combinations, such as “Australia/England”, but the numbers are so small I have simply excluded them from the analysis.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.