[This article was first published on randomjohn.github.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
US Census Data
The US Census collects a number of demographic measures and publishes aggregate data through its website. There are several ways to use Census data in R, from the Census API to the USCensus2010 package. If you are interested in geopolitical data in the US, I recommend exploring both these options – the Census API requires a key for each person who uses it, and the package requires downloading a very large dataset. The setups for both require some effort, but once that effort is done you don’t have to do it again.
The acs package in R allows you to access the Census API easily. I highly recommend checking it out, and that’s the method we will use here. Note that I’ve already defined the variable api_key – if you are trying to run this code you will need to first run something like api_key <- <enter your Census API key> before running the rest of this code.
For purposes here, we will use the toy example of plotting median household income by county for every county in South Carolina. First, we obtain the Census data. The first command gives us the table and variable names of what we want. I then use that table number in the acs.fetch command to get the variable I want.
B19126_001
B19126_002
B19126_003
B19126_004
B19126_005
B19126_006
B19126_007
B19126_008
B19126_009
B19126_010
B19126_011
Abbeville County, South Carolina
44918
55141
65664
50698
24835
43187
50347
24886
22945
18101
29958
Aiken County, South Carolina
57396
70829
72930
70446
29302
36571
35469
37906
27355
22760
34427
Allendale County, South Carolina
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Anderson County, South Carolina
53169
65881
75444
60166
26608
36694
37254
36297
24384
17835
29280
Bamberg County, South Carolina
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Barnwell County, South Carolina
44224
59467
70542
54030
19864
25143
18633
45714
18317
13827
21315
Plotting the map data
If you have the maps and ggplot2 packages, you already have the data you need to plot. We use the map_data function from ggplot2 to pull in county shape data for South Carolina. (A previous attempt at this blogpost had used the ggmap package, but there is an incompatibility between that and the latest ggplot2 package at the time of this writing.)
Merging the demographic and map data
Now we have the demographic data and the map, but merging the two will take a little effort. The reason is that the map data gives a lower case representation of the county and calls it a “subregion”, while the Census data returns the county as “xxxx County, South Carolina”. I use the dplyr and stringr packages (for str_replace) to make short work of this merge.
county
med_income
long
lat
group
order
region
abbeville
44918
-82.24809
34.41758
1
1
south carolina
abbeville
44918
-82.31685
34.35455
1
2
south carolina
abbeville
44918
-82.31111
34.33163
1
3
south carolina
abbeville
44918
-82.31111
34.29152
1
4
south carolina
abbeville
44918
-82.28247
34.26860
1
5
south carolina
abbeville
44918
-82.25955
34.25142
1
6
south carolina
abbeville
44918
-82.24809
34.21131
1
7
south carolina
abbeville
44918
-82.23663
34.18266
1
8
south carolina
abbeville
44918
-82.24236
34.15401
1
9
south carolina
abbeville
44918
-82.27674
34.10818
1
10
south carolina
It’s now a simple matter to plot this merged dataset. In fact, we only have to tweak a few things from the first time we plotted the map data.
Discussion
It’s pretty easy to plot U.S. Census data on a map. The real power of Census data comes not just from plotting it, but combining with other geographically-based data (such as crime). The acs package in R makes it easy to obtain Census data, which can then be merged with other data using packages such as dplyr and stringr and then plotted with ggplot2. Hopefully the authors of the ggmap and ggplot2 packages can work out their incompatibilities so that the above maps can be created using the Google API map or open street maps.
It should be noted that while I obtained county-level information, aggregate data can be obtained at Census block and tract levels as well, if you are looking to do some sort of localized analysis.
Related
To leave a comment for the author, please follow the link and comment on their blog: randomjohn.github.io.