Do you ever see strange lights in the sky? Do you wonder what really goes on in Area 51? Would you like to use your R hacking skills to get to the bottom of the whole UFO conspiracy? Of course, you would!
UFO data from infochimps is the focus of a data munging exercise in Chapter 1 of Machine Learning for Hackers by Drew Conway and John Myles White, two social scientists with a penchant for statistical computing.
Dividing the data up by state (for sightings in the US), I noticed something funny. My home state of Washington has a lot of UFO sightings. Normalizing by population, this becomes even more pronounced.
I learned a neat trick from the chapter. The transform function helps to compute derived fields in a data.frame. I use transform to compute UFO sightings per capita, after merging in population data by state from the 2000 census.
sightings.by.state <- <a href="http://stat.ethz.ch/R-manual/R-devel/library/base/html/transform.html">transform</a>( sightings.by.state, state=state, state.name=name, sightings=sightings, sightings.per.cap=sightings/pop)
Creating the plot above, with a pile of ggplot code, we see that Washington state really is off the deep end when it comes to UFO sightings. Our northwest neighbors in Oregon come in second. I asked a couple fellow Washington residents what they thought. The first reasonably conjectured a relationship to the number of air bases. The second Washingtonian gave the explanation I favor: “High kook density”.