Alabama is a foreign country

March 7, 2011
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Faculty and students of Iowa State University Department of Statistics published online an analysis of the data on 2009 distributions of the US Stimulus funds, aka the Recovery And Reinvestment Act. (The analysis was published in March last year as part of the Design for America competition, but I only recently came across it.) The analyses and associated charts were performed using R, the ggplot2 package, and various other tools. The various reports are accessible from the menu bar at the top of this page, such as this timeline map of the flow of funds in 2009:

Stimulus-timeline

One of the most interesting reports (titled, appropriately, Huh?) is on the errors the team detected in the published data and had to clean up before conducting the analysis. Typical mistakes included:

  • The state of Alabama being listed as a foreign country
  • Recipients of funds in Australia being given a lat-long coordinate in, yes, Austria
  • Data for China allocated to Switzerland and vice versa (probably due to confusion of the CN and CH country codes)

The GAO conducted a comprehensive review of such data errors. Just goes to show how important data cleaning is prior to any statistical analysis.

Iowa State Deoartment of Statistics: Design For America - Recovery And Reinvestment Act (via @achorripsis)

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.