Weighted population density

[This article was first published on R – Decision Science News, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



In Alaska, there’s about one person for each square mile of land.

You might picture a typical Alaskan not being able to see the next house.

But it’s not that way of course.

Most of Alaska is uninhabited. People have crowded into a few areas.

The average Alaskan experiences a population density of about 72 people per square mile.

That’s a lot more than one.

In the R code below, we roughly estimate the weighted population density for each US state, that is, the population density that the average person experiences. We do this for each state by taking the average of its counties’ population densities, weighted by the population of each county. It would be even better to do this for smaller areas, such as census tracts, but we were too lazy to chase down the data.

In the figure at the top of this post, we see the weighted and unweighted population densities for each state.

Note how New Jersey has a higher population density than New York, but when you look at what the average person experiences, it flips.

Amazingly, the average person in New York State shares a square mile with more than 10,000 other residents!

What states have the biggest ratios of weighted to unweighted densities? The chart below shows states with a 10x or greater ratio in blue.


Here are the states with the biggest ratios:

Alaska – The average person experiences a population density that is 62 times greater than the state’s density.
New York – 39 times
Nebraska – 24 times
Utah – 21 times
Colorado – 16 times
Minnesota – 14 times
Oregon – 13 times
New Mexico – 12 times
Kansas – 12 times
Texas – 12 times
Illinois – 12 times






#source https://factfinder.census.gov/bkmk/table/1.0/en/DEC/10_SF1/GCTPH1.US05PR

df %

filter(geo_id != "0400000US72") %>%

filter(geo_id != "0500000US11001") %>%

filter(geo_id != "0400000US11")

#make a state data frame with just four facts for each state (for later joining)

sdf = df %>%

filter(!is.na(geo_id_suffix)) %>%

filter(stringr::str_length(geo_id_suffix) < 5) %>% #states have short geoids


state = stringr::str_sub(geo_id_suffix, 1, 2),

geographic_area = stringr::str_sub(geographic_area, 16, stringr::str_length(geographic_area))

) %>%





names(sdf) = c("state", "geographic_area", "state_pop", "state_density")

#clean up county data, dropping irrelevant cols

df = df %>%

filter(!is.na(geo_id_suffix)) %>%

filter(stringr::str_length(geo_id_suffix) == 5) %>% #counties have geoids of length 5

mutate(state = stringr::str_sub(geo_id_suffix, 1, 2)) %>%

select( #drop unneeded columns




#join the state data with the county data

result = left_join(df, sdf, by = "state") %>%

group_by(state) %>%

summarise(weighted_density = round(sum(

population / state_pop * density_population_sqmi_land

), 0)) %>%

ungroup() %>%

left_join(sdf, .) %>%

arrange(-weighted_density) %>%

#mark states with weighted density 10x higher than unweighted density

mutate(highlight = weighted_density / state_density > 10)

write_csv(result, "result.csv")

#graphit, Schulte style

p = ggplot(result,

aes(x = state_density, y = weighted_density, color = highlight)) +

theme_bw() +

scale_x_log10(breaks = c(1, 3, 10, 30, 100, 300, 1000, 3000, 10000),

label = comma) +

scale_y_log10(breaks = c(1, 3, 10, 30, 100, 300, 1000, 3000, 10000),

label = comma) +

geom_point() +

geom_text_repel(aes(label = geographic_area)) +

geom_abline(slope = 1) +

theme(legend.position = "none") +

labs(x = "Unweighted Population Density", y = "Weighted Population Density")



plot = p,

file = "unweighted_v_weighted_density.png",

height = 8,

width = 8


#make a long version of result with two rows per state

result_l = result %>%

mutate(sortval = weighted_density) %>%

gather(measure, density, state_density:weighted_density) %>%

arrange(sortval, measure) %>%

mutate(measure = factor(measure, levels = c("weighted_density", "state_density")))

#graph it

p = ggplot(result_l, aes(

x = density,

# make the rows be states sorted by weighted density

y = reorder(geographic_area, sortval),

color = measure

)) +

theme_bw() +

geom_point(size = 3) +

#connect the two measures for each state with a line

geom_line(aes(group = geographic_area), color = "black") +

scale_x_log10(breaks = c(10, 30, 100, 300, 1000, 3000, 10000),

label = comma) +

theme(legend.position = "bottom") +

labs(x = "Population density", y = "States ranked by weighted population density") +


name = "",

breaks = c("weighted_density", "state_density"),

labels = c("Weighted Population Density", "Unweighted Population Density")




plot = p,

file = "state_v_unweighted_and_weighted_density.png",

height = 8,

width = 6


H/T Jake Hofman for getting me to do this and talking R.
H/T to Hadley Wickham for creating the tidyverse.

The post Weighted population density appeared first on Decision Science News.

To leave a comment for the author, please follow the link and comment on their blog: R – Decision Science News.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)