Alabama is a foreign country

March 7, 2011

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Faculty and students of Iowa State University Department of Statistics published online an analysis of the data on 2009 distributions of the US Stimulus funds, aka the Recovery And Reinvestment Act. (The analysis was published in March last year as part of the Design for America competition, but I only recently came across it.) The analyses and associated charts were performed using R, the ggplot2 package, and various other tools. The various reports are accessible from the menu bar at the top of this page, such as this timeline map of the flow of funds in 2009:


One of the most interesting reports (titled, appropriately, Huh?) is on the errors the team detected in the published data and had to clean up before conducting the analysis. Typical mistakes included:

  • The state of Alabama being listed as a foreign country
  • Recipients of funds in Australia being given a lat-long coordinate in, yes, Austria
  • Data for China allocated to Switzerland and vice versa (probably due to confusion of the CN and CH country codes)

The GAO conducted a comprehensive review of such data errors. Just goes to show how important data cleaning is prior to any statistical analysis.

Iowa State Deoartment of Statistics: Design For America – Recovery And Reinvestment Act (via @achorripsis)

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)