choroplethr v3.1.0: Better Summary Demographic Data

May 5, 2015
By

(This article was first published on Just an R Blog » R, and kindly contributed to R-bloggers)

Today I am happy to announce that choroplethr v3.1.0 is now on CRAN. You can get it by typing the following from an R console:

install.packages("choroplethr")

This version adds better support for summary demographic data for each state and county in the US. The data is in two data.frames and two functions. The data.frames are:

  • ?df_state_demographics: eight values for each state.
  • ?df_county_demographics: eight values for each county.

These statistics come from the US Census Bureau’s 2013 5-year American Community Survey (ACS). If you would like the same summary statistics from another ACS you can use these two function:

  • ?get_state_demograhpics
  • ?get_county_demograhpics

For more information on the ACS and choroplethr’s support for it, please see this page.

Relation to Previous Work

In many ways this update is a continuation of work that began with my April 7 guest blog post on the Revolution Analytics blog. In that piece (Exploring San Francisco with choroplethrZip) I explored the demographics of San Francisco ZIP Codes. Because of the interest in that piece, I subsequently released the data as part of the choroplethrZip package. This update simply brings that functionality to the main choroplethr package.

Note that caveats apply to this data. ACS data represent samples, not full counts. I simplify the Census Bureau’s complex framework for dealing with race and ethnicity by dealing with only White not Hispanic, Asian not Hispanic, Black or African American not Hispanic and Hispanic all Races. I chose simplicity over completeness because my goal is to demonstrate technology.

Explore the Data Online

You can explore this data with a web application that I created here. The source code for the app is available here. This app demonstrates some of my favorite ways of exploring demographic data:

  • Using a boxplot to explore the distribution of the data
  • Exploring the data at both the state and county level
  • Using choropleth maps to explore geographic patterns of the data
  • Allowing the user to change the number of colors used:
    • 1 color uses a continuous scale, which makes outliers easy to see
    • Using 2 thru 9 colors puts an equal number of regions in each color. For example, using 2 colors shows values above and below the median

In my opinion, datasets like this really lend themselves to web applications because there are so many ways to visualize the data, and no single way is authoritative.

Selected Images

One of my biggest surprises when exploring this dataset was to discover its strong regional patterns. For example, the regions with the highest percentage White not Hispanic residents tend to be in the north central and north east. The regions with the highest percentage of Black or African American not Hispanic residents is in the south east. And the regions with the highest concentration of Hispanic all Races is in the south west:

state-white

state-black

state-hispanic

Switching to counties shows us the variation within each state. And switching to a continuous scale highlights the outliers.

county-black-continuous

county-hispanic-continuous

To leave a comment for the author, please follow the link and comment on their blog: Just an R Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)