Weighted population density

June 26, 2017
By

(This article was first published on R – Decision Science News, and kindly contributed to R-bloggers)

DENSITY THE AVERAGE PERSON EXPERIENCES


CLICK TO ENLARGE

In Alaska, there’s about one person for each square mile of land.

You might picture a typical Alaskan not being able to see the next house.

But it’s not that way of course.

Most of Alaska is uninhabited. People have crowded into a few areas.

The average Alaskan experiences a population density of about 72 people per square mile.

That’s a lot more than one.

In the R code below, we roughly estimate the weighted population density for each US state, that is, the population density that the average person experiences. We do this for each state by taking the average of its counties’ population densities, weighted by the population of each county. It would be even better to do this for smaller areas, such as census tracts, but we were too lazy to chase down the data.

In the figure at the top of this post, we see the weighted and unweighted population densities for each state.

Note how New Jersey has a higher population density than New York, but when you look at what the average person experiences, it flips.

Amazingly, the average person in New York State shares a square mile with more than 10,000 other residents!

What states have the biggest ratios of weighted to unweighted densities? The chart below shows states with a 10x or greater ratio in blue.

CLICK TO ENLARGE

Here are the states with the biggest ratios:

Alaska – The average person experiences a population density that is 62 times greater than the state’s density.
New York – 39 times
Nebraska – 24 times
Utah – 21 times
Colorado – 16 times
Minnesota – 14 times
Oregon – 13 times
New Mexico – 12 times
Kansas – 12 times
Texas – 12 times
Illinois – 12 times

R CODE FOR THOSE WHO WISH TO FOLLOW ALONG AT HOME

library(tidyverse)
library(ggrepel)
library(scales)
setwd("C:/Dropbox/Projects/20170605_Population_Density")
#source https://factfinder.census.gov/bkmk/table/1.0/en/DEC/10_SF1/GCTPH1.US05PR
df <- read_csv("DEC_10_SF1_GCTPH1.US05PR.csv", skip = 1) names(df) = c( "id", "state", "country", "geo_id", "geo_id_suffix", "geographic_area", "county_name", "population", "housing_units", "total_area", "water_area", "land_area", "density_population_sqmi_land", "density_housing_units_sqmi_land" ) #drop puerto rico and DC. sorry guys! df = df %>%
filter(geo_id != "0400000US72") %>%
filter(geo_id != "0500000US11001") %>%
filter(geo_id != "0400000US11")
#make a state data frame with just four facts for each state (for later joining)
sdf = df %>%
filter(!is.na(geo_id_suffix)) %>%
filter(stringr::str_length(geo_id_suffix) < 5) %>% #states have short geoids
mutate(
state = stringr::str_sub(geo_id_suffix, 1, 2),
geographic_area = stringr::str_sub(geographic_area, 16, stringr::str_length(geographic_area))
) %>%
select(state,
geographic_area,
population,
density_population_sqmi_land)
names(sdf) = c("state", "geographic_area", "state_pop", "state_density")
#clean up county data, dropping irrelevant cols
df = df %>%
filter(!is.na(geo_id_suffix)) %>%
filter(stringr::str_length(geo_id_suffix) == 5) %>% #counties have geoids of length 5
mutate(state = stringr::str_sub(geo_id_suffix, 1, 2)) %>%
select( #drop unneeded columns
-id,-country,-geo_id,-housing_units,-total_area,
-water_area,-density_housing_units_sqmi_land
)
#join the state data with the county data
result = left_join(df, sdf, by = "state") %>%
group_by(state) %>%
summarise(weighted_density = round(sum(
population / state_pop * density_population_sqmi_land
), 0)) %>%
ungroup() %>%
left_join(sdf, .) %>%
arrange(-weighted_density) %>%
#mark states with weighted density 10x higher than unweighted density
mutate(highlight = weighted_density / state_density > 10)
write_csv(result, "result.csv")
#graphit, Schulte style
p = ggplot(result,
aes(x = state_density, y = weighted_density, color = highlight)) +
theme_bw() +
scale_x_log10(breaks = c(1, 3, 10, 30, 100, 300, 1000, 3000, 10000),
label = comma) +
scale_y_log10(breaks = c(1, 3, 10, 30, 100, 300, 1000, 3000, 10000),
label = comma) +
geom_point() +
geom_text_repel(aes(label = geographic_area)) +
geom_abline(slope = 1) +
theme(legend.position = "none") +
labs(x = "Unweighted Population Density", y = "Weighted Population Density")
p
ggsave(
plot = p,
file = "unweighted_v_weighted_density.png",
height = 8,
width = 8
)
#make a long version of result with two rows per state
result_l = result %>%
mutate(sortval = weighted_density) %>%
gather(measure, density, state_density:weighted_density) %>%
arrange(sortval, measure) %>%
mutate(measure = factor(measure, levels = c("weighted_density", "state_density")))
#graph it
p = ggplot(result_l, aes(
x = density,
# make the rows be states sorted by weighted density
y = reorder(geographic_area, sortval),
color = measure
)) +
theme_bw() +
geom_point(size = 3) +
#connect the two measures for each state with a line
geom_line(aes(group = geographic_area), color = "black") +
scale_x_log10(breaks = c(10, 30, 100, 300, 1000, 3000, 10000),
label = comma) +
theme(legend.position = "bottom") +
labs(x = "Population density", y = "States ranked by weighted population density") +
scale_color_discrete(
name = "",
breaks = c("weighted_density", "state_density"),
labels = c("Weighted Population Density", "Unweighted Population Density")
)
p
ggsave(
plot = p,
file = "state_v_unweighted_and_weighted_density.png",
height = 8,
width = 6
)

H/T Jake Hofman for getting me to do this and talking R.
H/T to Hadley Wickham for creating the tidyverse.

The post Weighted population density appeared first on Decision Science News.

To leave a comment for the author, please follow the link and comment on their blog: R – Decision Science News.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)