Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

DENSITY THE AVERAGE PERSON EXPERIENCES

CLICK TO ENLARGE

In Alaska, there’s about one person for each square mile of land.

You might picture a typical Alaskan not being able to see the next house.

But it’s not that way of course.

Most of Alaska is uninhabited. People have crowded into a few areas.

The average Alaskan experiences a population density of about 72 people per square mile.

That’s a lot more than one.

In the R code below, we roughly estimate the weighted population density for each US state, that is, the population density that the average person experiences. We do this for each state by taking the average of its counties’ population densities, weighted by the population of each county. It would be even better to do this for smaller areas, such as census tracts, but we were too lazy to chase down the data.

In the figure at the top of this post, we see the weighted and unweighted population densities for each state.

Note how New Jersey has a higher population density than New York, but when you look at what the average person experiences, it flips.

Amazingly, the average person in New York State shares a square mile with more than 10,000 other residents!

What states have the biggest ratios of weighted to unweighted densities? The chart below shows states with a 10x or greater ratio in blue.

CLICK TO ENLARGE

Here are the states with the biggest ratios:

Alaska – The average person experiences a population density that is 62 times greater than the state’s density.
New York – 39 times
Utah – 21 times
Minnesota – 14 times
Oregon – 13 times
New Mexico – 12 times
Kansas – 12 times
Texas – 12 times
Illinois – 12 times

R CODE FOR THOSE WHO WISH TO FOLLOW ALONG AT HOME
 library(tidyverse) library(ggrepel) library(scales) setwd("C:/Dropbox/Projects/20170605_Population_Density") #source https://factfinder.census.gov/bkmk/table/1.0/en/DEC/10_SF1/GCTPH1.US05PR df <- read_csv("DEC_10_SF1_GCTPH1.US05PR.csv", skip = 1) names(df) = c( "id", "state", "country", "geo_id", "geo_id_suffix", "geographic_area", "county_name", "population", "housing_units", "total_area", "water_area", "land_area", "density_population_sqmi_land", "density_housing_units_sqmi_land" ) #drop puerto rico and DC. sorry guys! df = df %>% filter(geo_id != "0400000US72") %>% filter(geo_id != "0500000US11001") %>% filter(geo_id != "0400000US11") #make a state data frame with just four facts for each state (for later joining) sdf = df %>% filter(!is.na(geo_id_suffix)) %>% filter(stringr::str_length(geo_id_suffix) < 5) %>% #states have short geoids mutate( state = stringr::str_sub(geo_id_suffix, 1, 2), geographic_area = stringr::str_sub(geographic_area, 16, stringr::str_length(geographic_area)) ) %>% select(state, geographic_area, population, density_population_sqmi_land) names(sdf) = c("state", "geographic_area", "state_pop", "state_density") #clean up county data, dropping irrelevant cols df = df %>% filter(!is.na(geo_id_suffix)) %>% filter(stringr::str_length(geo_id_suffix) == 5) %>% #counties have geoids of length 5 mutate(state = stringr::str_sub(geo_id_suffix, 1, 2)) %>% select( #drop unneeded columns -id,-country,-geo_id,-housing_units,-total_area, -water_area,-density_housing_units_sqmi_land ) #join the state data with the county data result = left_join(df, sdf, by = "state") %>% group_by(state) %>% summarise(weighted_density = round(sum( population / state_pop * density_population_sqmi_land ), 0)) %>% ungroup() %>% left_join(sdf, .) %>% arrange(-weighted_density) %>% #mark states with weighted density 10x higher than unweighted density mutate(highlight = weighted_density / state_density > 10) write_csv(result, "result.csv") #graphit, Schulte style p = ggplot(result, aes(x = state_density, y = weighted_density, color = highlight)) + theme_bw() + scale_x_log10(breaks = c(1, 3, 10, 30, 100, 300, 1000, 3000, 10000), label = comma) + scale_y_log10(breaks = c(1, 3, 10, 30, 100, 300, 1000, 3000, 10000), label = comma) + geom_point() + geom_text_repel(aes(label = geographic_area)) + geom_abline(slope = 1) + theme(legend.position = "none") + labs(x = "Unweighted Population Density", y = "Weighted Population Density") p ggsave( plot = p, file = "unweighted_v_weighted_density.png", height = 8, width = 8 ) #make a long version of result with two rows per state result_l = result %>% mutate(sortval = weighted_density) %>% gather(measure, density, state_density:weighted_density) %>% arrange(sortval, measure) %>% mutate(measure = factor(measure, levels = c("weighted_density", "state_density"))) #graph it p = ggplot(result_l, aes( x = density, # make the rows be states sorted by weighted density y = reorder(geographic_area, sortval), color = measure )) + theme_bw() + geom_point(size = 3) + #connect the two measures for each state with a line geom_line(aes(group = geographic_area), color = "black") + scale_x_log10(breaks = c(10, 30, 100, 300, 1000, 3000, 10000), label = comma) + theme(legend.position = "bottom") + labs(x = "Population density", y = "States ranked by weighted population density") + scale_color_discrete( name = "", breaks = c("weighted_density", "state_density"), labels = c("Weighted Population Density", "Unweighted Population Density") ) p ggsave( plot = p, file = "state_v_unweighted_and_weighted_density.png", height = 8, width = 6 ) 

H/T Jake Hofman for getting me to do this and talking R.
H/T to Hadley Wickham for creating the tidyverse.

The post Weighted population density appeared first on Decision Science News.