I plot the frequency of wikipedia searches of “Behavioral Economics”, and “Beer” – who knew the correlation would be 0.7!
Data on any wikipedia searches (back to 2007) are available at http://glimmer.rstudio.com/pssguy/wikiSearchRates/. The website allows you to download frequency hits per day as a csv, which is what I've done here.
# Behavioral Economics and Beer:<br /><br /># Author: Mark T Patterson Date: March 18, 2013<br /><br /># Clear Workbench:<br />rm(list = ls())<br /><br /># libraries:<br />library(lubridate)<br />library(ggplot2)<br />
## Find out what's changed in ggplot2 with ## news(Version == "0.9.1", package = "ggplot2")
<br /># data:<br />curr.wd = getwd()<br />setwd("C:/Users/Mark/Desktop/Blog/Data")<br />ts = read.csv("BehavEconBeer.csv", header = TRUE)<br />setwd(curr.wd)<br /><br /># cleaning the dataset: str(ts)<br />ts$date = as.character(ts$date)<br />ts$date = mdy(ts$date)<br />
## Using date format %m/%d/%Y.
ts = ts[, -1]<br />
Note: the mdy function is in the lubridate package, which cleanly handles time/date data. I've eliminated the first column of data, which just gives row names inherited from excel.
p = ggplot(ts, aes(x = date, y = count)) + geom_line(aes(color = factor(name)), <br /> size = 2)<br />p<br />
It turns out the pattern we observe isn't at all unique – many variables follow (predictable) patterns of variation through the week. This doesn't necessarily mean, though, that the correlation between beer and behavioral economics is entirely spurious!