Notes on a Scandal – When Jimmy beat Katy
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
No the title doesn’t refer to how Katy Perry suffered at another of Jimmy Savile’s sexual predelictions, although these are two of the participants. I’ll get to the details later
Just over a year ago, I reflected on the relative wiki searches of leading female singing celebrities, including Ms Perry. In the light of the recent Jimmy Savile scandal, I thought to revisit the area.
For the first post, I relied on code from a now-defunct web site and had not examined the raw data. It now appears to me as though wiki are not providing the information in the same way. The good news is that they offer a web page with daily searches for each month in JSON format, which actually simplifies matters
For this exercise, I have produced a function which collects and tabulates data for a set of people, produces graphs of their individual daily count data from the beginning of 2008 onwards and creates a group graph within a specified date range. The code is shown at the bottom of the page
Here is some of the output for some of the people mentioned during the scandal coverage
Savile, naturally, leads the way with ex-glam rock star, Gary Glitter, following. This probably reflects his generally greater fame and the severity of the allegations against him compared with DJ, Dave Lee Travis, and dead actor, Wilfrid Brambell
Now for the summary table. The difference between median and mean reflects the situation of steady daily searches punctuated by leaps when publicity occurs
Interestingly, the scandal has not produced the maximum search count for any of the four.
- Dave Lee Travis peaked when Burmese pro-democracy leader Aung San Suu Kyi said his World Service programme had given her a lifeline
- Over the timespan of the scandal, Savile’s travails in terms of searches are significant but his death sparked the individually highest rate
- A TV show, detailing a feud between Brambell and his co-star of “Steptoe and Son”, Harry H Corbett, led to the former’s highest search on Wikipedia
Glitter’s graph shows several peaks before this month representing chronologically; his release from Thai jail and attempt to avoid returning to the UK; the mockumentary, “The Execution of Gary Glitter” shown on Channel 4; and incorrect rumours that he was planning a new tour
So how did Jimmy beat Katy? With a max search almost double her highest of 101,922
# Packages required library(RJSONIO) # acquiring and parsing data library(ggplot2) # graphs library(plyr) # creation of summary data # create dataframes for all and summary data allData <- data.frame(count=numeric(),date=character(),name=character()) summaryata <- data.frame(name=character(),mean=numeric(),median=numeric(),max=numeric(),maxdate=character()) #maxdate=date() causes error # create variables for url month <- c("01","02","03","04","05","06","07","08","09","10","11","12") year <- c(2008:2012) # function with default dates for comparison graph wikiFun <- function(person, startDate="2012-09-01",endDate="2012-11-01") { for(k in 1:length(person)) { # create dataframe for individual records df <- data.frame(count=numeric()) for (i in 1:length(year)) { for (j in 1:length(month)) { url <- paste0("http://stats.grok.se/json/en/",year[i],month[j],"/",person[k]) raw.data <- readLines(url, warn="F") rd <- fromJSON(raw.data) rd.views <- rd$daily_views df <- rbind(df,as.data.frame(rd.views)) } } # create a df with all peoples search counts by day df$date <- as.Date(rownames(df)) df$name <- person[k] colnames(df) <- c("count","date","name") df <- arrange(df,date) allData <- rbind(allData,df) # set title display and save individual's graph theTitle <- paste0("Daily Wikipedia searches for ",person[k]) q <- ggplot(subset(df,df$count>0),aes(x=date,y=count))+geom_point()+xlab("")+ylab("")+ggtitle(theTitle) # individual plot prints to screen windows() plot(q) fname <- paste0("ws_",gsub(" ","",person[k]),".png") dev.copy(png,file=fname) dev.off() } # display and save group graph using log scale for counts p <- ggplot(subset(allData,count>0&date>=as.Date(startDate, "%Y-%m-%d")&date<=as.Date(endDate, "%Y-%m-%d")),aes(x=date,y=count, colour=name))+geom_line()+xlab("")+ylab("")+ggtitle("Comparison of Daily Wikipedia searches") + coord_trans(y="log2") #+scale_y_continuous(formatter=comma) caused error windows() plot(p) dev.copy(png,file="group_graph.png") dev.off() # calculate summaries , display and save summaryData <- ddply(subset(allData,count>0),.(name), summarize, mean=mean(count), median=median(count), max=max(count), max_date=date[which.max(count)] ) print(summaryData) write.csv(summaryData,"group_data.csv") } names <- c("Gary Glitter","Jimmy Savile","Dave Lee Travis","Wilfrid Brambell") wikiFun(names) |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.