24 Days of R: Day 5

[This article was first published on PirateGrunt » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Some time back, I started a project on GitHub wherein I would explore the efficacy of financial literacy efforts in the area where I live. This is done with the support of a local non-profit organization.

As a first step, I tried to draw a picture of the area at a relatively fine level of detail. This relies on the UScensus suite of packages that I wrote about a couple days ago. Today, we'll be looking at data for five counties in North Carolina at the level of a US census tract. First, we'll load up the data and see what levels of homeownership are.

library("UScensus2010")
library("UScensus2010tract")

durham = county(name = "durham", state = "nc", level = "tract")
orange = county(name = "orange", state = "nc", level = "tract")
wake = county(name = "wake", state = "nc", level = "tract")
johnston = county(name = "johnston", state = "nc", level = "tract")
chatham = county(name = "chatham", state = "nc", level = "tract")

uwgt = spRbind(orange, durham)
uwgt = spRbind(uwgt, wake)
uwgt = spRbind(uwgt, johnston)
uwgt = spRbind(uwgt, chatham)

rm(durham, orange, wake, johnston, chatham)

Whether or not someone owns their home is a strong indicator of economic stability and the potential to retain and accumlate wealth. What percentage of folks own their own home?

# Description of codes can be found in the documentation for the
# UScensus2010 package
uwgt$TotalPopulation = uwgt$H0030002
uwgt$pctHomeowner = 1 - uwgt$H0040004/uwgt$H0030002
plot(uwgt$pctHomeowner[order(uwgt$pctHomeowner)], pch = 19)

plot of chunk Homeownership

plot(uwgt$TotalPopulation, uwgt$pctHomeowner, pch = 19, xlab = "Total population", 
    ylab = "% Homeownership")

plot of chunk Homeownership

We see that it runs the gamut from zero to 100% homeownership. We might assume that areas of higher population have lower percentages of home ownership. Such areas may be more densely populated and urbanized where people are likely to rent. However, there doesn't appear to be any relationship between the total population and home ownership. The construction of a census tract may have something to do with this.

We'll recreate the choropleth helper function from two days ago so that we can map this data. We'll then draw a map that shows high and low concentrations of homeowners.

library(RColorBrewer)
library(classInt)

MyChoropleth = function(sp, dem, palette, ...) {
    df = sp@data
    brks = classIntervals(df[, dem], n = length(palette), style = "quantile")
    brks = brks$brks

    sp$MyColor = palette[findInterval(df[, dem], brks, all.inside = TRUE)]
    plot(sp, col = sp$MyColor, axes = F, ...)
}

myPalette = brewer.pal(9, "Blues")

MyChoropleth(uwgt, "pctHomeowner", myPalette, border = "transparent")

plot of chunk ChoroplethHelper

dfCountyColor = data.frame(county = c("135", "063", "183", "101", "037"), countyName = c("Orange", 
    "Durham", "Wake", "Johnston", "Chatham"), color = c("orange", "blue", "red", 
    "green", "yellow"))
uwgt = merge(uwgt, dfCountyColor)

There's a clear geographic distribution at work. In the central part of the map the area between Durham and Raleigh has lower levels of home ownership. These are more urbanized areas, which means they may have more young or transient residents. However, these are also areas of low wealth. We can see this when we load in data from the American Community Survey.

setwd("~/GitHub/FinancialLiteracy/Data/ACS_11_5YR_B17005")
dfCensus = read.csv("ACS_11_5YR_B17005.csv", skip = 1)
marginOfError = grep("margin", colnames(dfCensus), ignore.case = TRUE)

dfCensus = dfCensus[, -marginOfError]
rm(marginOfError)

colnames(dfCensus) = gsub(".", "", colnames(dfCensus), fixed = TRUE)
colnames(dfCensus) = gsub("Estimate", "", colnames(dfCensus), fixed = TRUE)

uwgtACS = merge(uwgt, dfCensus, by.x = "fips", by.y = "Id2", all.x = TRUE)
uwgtACS$pctNonPoverty = 1 - uwgtACS$Incomeinthepast12monthsbelowpovertylevel/uwgtACS$Total
par(mfrow = c(1, 2))
MyChoropleth(uwgtACS, "pctHomeowner", myPalette, border = "transparent")
title("% Homeowners")
MyChoropleth(uwgtACS, "pctNonPoverty", myPalette, border = "transparent")
title("% Above Poverty")

ACSdata

Although there are some exceptions (e.g. folks in the RTP) there's visual evidence of a relationship. We can establish this through a simple linear model.

plot(uwgtACS$pctNonPoverty, uwgtACS$pctHomeowner, pch = 19, xlab = "% Above Poverty", 
    ylab = "% Homeowners")
fit = lm(pctHomeowner ~ pctNonPoverty, data = uwgtACS)
summary(fit)

## 
## Call:
## lm(formula = pctHomeowner ~ pctNonPoverty, data = uwgtACS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7773 -0.1106  0.0246  0.1382  0.4664 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -0.4019     0.0631   -6.37  6.9e-10 ***
## pctNonPoverty   1.1792     0.0715   16.49  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.178 on 310 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.467,  Adjusted R-squared:  0.466 
## F-statistic:  272 on 1 and 310 DF,  p-value: <2e-16

lines(uwgtACS$pctNonPoverty[!is.na(uwgtACS$pctNonPoverty)], predict(fit))

plot of chunk LinearModel

Obviously, there are many other factors at play- marital status, available housing stock, zoning laws, size of family, type of employment- to name but a few. One thing I'd like to explore is the influence of county government on various statistics. Here's the same plot, with sample points color coded by county:

uwgtACS$color = as.character(uwgtACS$color)
par(mfrow = c(1, 1))
plot(uwgtACS$pctNonPoverty, uwgtACS$pctHomeowner, pch = 19, xlab = "% Above Poverty", 
    ylab = "% Homeowners", col = uwgtACS$color)

plot of chunk ByCounty

I'll explore that in a later post.

Tomorrow: not sure what I'll write about! Possibly the PISA testing results that were released this week.

citation("UScensus2010tract")

## 
## To cite UScensus2000 in publications use:
## 
##   Zack W. Almquist (2010). US Census Spatial and Demographic Data
##   in R: The UScensus2000 Suite of Packages. Journal of Statistical
##   Software, 37(6), 1-31. URL http://www.jstatsoft.org/v37/i06/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {US Census Spatial and Demographic Data in {R}: The {UScensus2000} Suite of Packages},
##     author = {Zack W. Almquist},
##     journal = {Journal of Statistical Software},
##     year = {2010},
##     volume = {37},
##     number = {6},
##     pages = {1--31},
##     url = {http://www.jstatsoft.org/v37/i06/},
##   }

sessionInfo()

## R version 3.0.2 (2013-09-25)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.4.1            RWordPress_0.2-3       UScensus2010tract_1.00
## [4] UScensus2010_0.11      foreign_0.8-55         maptools_0.8-27       
## [7] classInt_0.1-21        RColorBrewer_1.0-5     sp_1.0-13             
## 
## loaded via a namespace (and not attached):
##  [1] class_7.3-9     digest_0.6.3    e1071_1.6-1     evaluate_0.4.7 
##  [5] formatR_0.9     grid_3.0.2      lattice_0.20-23 markdown_0.6.3 
##  [9] RCurl_1.95-4.1  stringr_0.6.2   tools_3.0.2     XML_3.98-1.1   
## [13] XMLRPC_0.3-0

To leave a comment for the author, please follow the link and comment on their blog: PirateGrunt » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)