How to measure spatial diversity and segregation?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The raceland package implements a computational framework for a pattern-based, zoneless analysis and visualization of (ethno)racial topography.
The main concept in this package is a racial landscape (RL). It consists of many large and small patches (racial enclaves) formed by adjacent raster grid cells having the same race categories. The distribution of racial enclaves creates a specific spatial pattern, which can be quantified by two metrics (entropy and mutual information) derived from the Information Theory concept (IT). Entropy is the measure of racial diversity and mutual information measures racial segregation.
Methods in the raceland package are based on the raster data, and unlike the previous methods, do not depend on the division for specific zones (census tract, census block, etc.). Calculation of racial diversity (entropy) and racial segregation (mutual information) can be performed for the whole area of interests (i.e., metropolitan area) or any portion of the whole area without introducing any arbitrary divisions.
To learn more about this topic, read our Applied Geography article or its preprint:
Dmowska, A., Stepinski T., Nowosad J. Racial landscapes – a pattern-based, zoneless method for analysis and visualization of racial topography. Applied Geography. 122:1-9, DOI:10.1016/j.apgeog.2020.102239
To reproduce the results on your own computer, install and attach the following packages:
library(raceland) library(raster) library(sf) library(tmap) library(dplyr)
You also need to download and extract the
data.zip file containing the example data.
temp_data_file = tempfile(fileext = ".zip") download.file("https://github.com/Nowosad/raceland-bp1/raw/master/data.zip", destfile = temp_data_file, mode = "wb") unzip(temp_data_file)
The presented approach requires a set of rasters, where each raster represents one of five race-groups: Asians, Blacks, Hispanic, others, and Whites. In this example, we use data limited to the city of Cincinnati, Ohio.
list_raster = dir("data", pattern = ".tif$", full.names = TRUE) race_raster = stack(list_raster)
We also use vector data containing the city borders to ease the understanding of the results.
cincinnati = read_sf("data/cincinnati.gpkg")
We can visualize the data using the tmap package:
tm_race = tm_shape(race_raster) + tm_raster(style = "fisher", n = 10, palette = "viridis", title = "Number of people") + tm_facets(nrow = 3) + tm_shape(cincinnati) + tm_borders(lwd = 3, col = "black") tm_race
The above maps show the distribution of people from different race-groups in Cincinnati. Each, 30 by 30 meters, cell represents a number of people living in this area. Data was obtained from http://sil.uc.edu/cms/index.php?id=socscape-data and preprocessed using the instructions at https://cran.r-project.org/web/packages/raceland/vignettes/raceland-intro3.html.
Our goal is to measure racial diversity and racial segregation for different places in the city. We can use the
quanfity_raceland() function for this purpose.
results_metrics = quanfity_raceland(race_raster, n = 30, window_size = 10, fun = "mean", size = 20, threshold = 0.75) head(results_metrics) ## Simple feature collection with 6 features and 4 fields ## geometry type: POLYGON ## dimension: XY ## bbox: xmin: 978285 ymin: 1858035 xmax: 984885 ymax: 1859235 ## CRS: +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## row col ent mutinf geometry ## 30 1 30 1.1050557 0.01559040 POLYGON ((981885 1859235, 9... ## 31 1 31 1.3120756 0.03010253 POLYGON ((982485 1859235, 9... ## 33 1 33 1.1301688 0.01910744 POLYGON ((983685 1859235, 9... ## 34 1 34 1.6320160 0.06155428 POLYGON ((984285 1859235, 9... ## 74 2 24 0.9527805 0.01716798 POLYGON ((978285 1858635, 9... ## 80 2 30 1.4438328 0.04498205 POLYGON ((981885 1858635, 9...
It requires several arguments:
x– RasterStack with race-specific population densities assign to each cell
n– a number of realizations
window_size– expressed in the numbers of cells, is a length of the side of a square-shaped block of cells for which local densities will be calculated
fun– function to calculate values from adjacent cells to contribute to exposure matrix,
"mean"– calculate average values of local population densities from adjacent cells,
"geometric_mean"– calculate geometric mean values of local population densities from adjacent cells, or
"focal"assign value from the focal cell
size– expressed in the numbers of cells, is a length of the side of a square-shaped block of cells. It defines the extent of a local pattern
threshold– the share of NA cells to allow metrics calculation
The result is a spatial vector object containing areas of the size of 20 by 20 cells from input data (600 by 600 meters in this example). Its attribute table has five columns –
col allowing for identification of each square polygon,
ent – entropy measuring racial diversity,
mutinf – mutual information, which is associated with measuring racial segregation, and
geometry containing spatial geometries.
diversity_map = tm_shape(results_metrics) + tm_polygons(col = "ent", title = "Diversity", style = "cont", palette = "magma") + tm_shape(cincinnati) + tm_borders(lwd = 1, col = "black") segregation_map = tm_shape(results_metrics) + tm_polygons(col = "mutinf", title = "Segregation", style = "cont", palette = "cividis") + tm_shape(cincinnati) + tm_borders(lwd = 1, col = "black") tmap_arrange(diversity_map, segregation_map)
The above result present areas with different levels of racial diversity and segregation. Interestingly, there is a low correlation between these two properties. Some areas inside of the city do not have any value attached – this indicates either they are covered with missing values in more than 75% of their areas or nobody lives there.
quanfity_raceland() function is a wrapper around several steps implemented in raceland, namely
create_grid(). All of them can be used sequentially, as you can see below.
Additionally, the raceland package has
zones_to_raster() function that prepares input data based on spatial vector data with race counts.
Constructing racial landscapes
The racial landscape is a high-resolution grid in which each cell contains only inhabitants of a single race. It is constructed using the
create_realizations() function, which expects a stack of race-specific rasters. Racial composition at each cell is translated into probabilities of drawing a person of a specific race from a cell. For example, if a cell has 100 people, where 90 are classified as Black (90% chance) and 10 as White (10% chance), then we can assign a specific race randomly based on these probabilities.
This approach generates a specified number (
n = 30, in this case) of realization with slightly different patterns.
realizations_raster = create_realizations(race_raster, n = 30)
The output of this function is a RasterStack, where each raster contains values from 1 to
k is a number of provided race-specific grids. In this case, we provided five race-specific grids (Asians, Blacks, Hispanic, others, and Whites), therefore the value of
1 in the output object represents Asians, number
2 Blacks, etc.
my_pal = c("#F16667", "#6EBE44", "#7E69AF", "#C77213", "#F8DF1D") tm_realizations = tm_shape(realizations_raster[[1:4]]) + tm_raster(style = "cat", palette = my_pal, labels = c("Asians", "Blacks", "Hispanic", "others", "Whites"), title = "") + tm_facets(ncol = 2) + tm_shape(cincinnati) + tm_borders(lwd = 3, col = "black") + tm_layout(panel.labels = paste("Realization", 1:30)) tm_realizations
The above plot shows four of 30 created realizations and makes it clear that all of them are fairly similar.
Now, for each of the created realization, we can calculate local densities of subpopulations (race-specific local densities) using the
dens_raster = create_densities(realizations_raster, race_raster, window_size = 10)
The output is a RasterStack with local densities calculated separately for each realization.
tm_density = tm_shape(dens_raster[[1:4]]) + tm_raster(style = "fisher", n = 10, palette = "viridis", title = "Number of people") + tm_facets(ncol = 2) + tm_shape(cincinnati) + tm_borders(lwd = 3, col = "black") + tm_layout(panel.labels = paste("Realization", 1:30)) tm_density
Total diversity and segregation
We can use both, realizations and density rasters, to calculate racial diversity and segregation using
calculate_metrics() function. It calculates four information theory-derived metrics: entropy (
ent), joint entropy (
joinent), conditional entropy (
condent), and mutual information (
mutinf). As we mentioned before,
ent is measuring racial diversity, while
mutinf is associated with racial segregation. These metrics can be calculated for a given spatial scale. For example, setting
NULL, as in the example below, calculates the metrics for the whole area of each realization.
metr_df = calculate_metrics(x = realizations_raster, w = dens_raster, fun = "mean", size = NULL, threshold = 1) head(metr_df) ## realization row col ent joinent condent mutinf ## 1 1 1 1 1.400229 2.625657 1.225428 0.1748010 ## 2 2 1 1 1.398806 2.624101 1.225295 0.1735102 ## 3 3 1 1 1.398361 2.623339 1.224978 0.1733824 ## 4 4 1 1 1.400530 2.625777 1.225247 0.1752829 ## 5 5 1 1 1.395641 2.617376 1.221734 0.1739072 ## 6 6 1 1 1.397392 2.616627 1.219235 0.1781572
Now, we can calculate average metrics across all realization, which should give more accurate results.
metr_df %>% summarise( mean_ent = mean(ent, na.rm = TRUE), mean_mutinf = mean(mutinf) ) ## mean_ent mean_mutinf ## 1 1.397863 0.1742165
These values could be compared with values obtained by other US cities to evaluate, which cities have high average racial diversity (larger values of
mean_ent) and which have high average racial segregation (larger values of
Local diversity and segregation
The information theory-derived metrics can be also calculated for smaller, local scales using the
size argument. It describes the size of a local area for metrics calculations. For example,
size = 20 indicates that each local area will consist of 20 by 20 cells of the original raster.
metr_df_20 = calculate_metrics(x = realizations_raster, w = dens_raster, fun = "mean", size = 20, threshold = 0.75)
Now, we can summarize the results for each local area independently (
smr = metr_df_20 %>% group_by(row, col) %>% summarize( ent_mean = mean(ent, na.rm = TRUE), mutinf_mean = mean(mutinf, na.rm = TRUE), ) %>% na.omit() head(smr) ## # A tibble: 6 x 4 ## # Groups: row  ## row col ent_mean mutinf_mean ## <dbl> <dbl> <dbl> <dbl> ## 1 1 30 1.09 0.0152 ## 2 1 31 1.30 0.0356 ## 3 1 33 1.12 0.0159 ## 4 1 34 1.62 0.0576 ## 5 2 24 0.959 0.0195 ## 6 2 30 1.44 0.0445
Each row in the obtained results relates to some spatial locations. We can create an empty grid with appropriate dimensions using the
create_grid() function. Its
size argument expects the same value as used in the
grid_sf = create_grid(realizations_raster, size = 20)
The result is a spatial vector object with three columns:
col allowing for identification of each square polygon, and
geometry containing spatial geometries.
tm_shape(grid_sf) + tm_polygons()
The first two columns,
col, can be used to connect the grid with summary results.
grid_attr = dplyr::left_join(grid_sf, smr, by = c("row", "col")) grid_attr = na.omit(grid_attr)
Finally, we are able to create two maps. The first one represents racial diversity (larger the value, larger the diversity; the
ent_mean variable) and the second one shows racial segregation (larger the value, larger the segregation; the
diversity_map = tm_shape(grid_attr) + tm_polygons(col = "ent_mean", title = "Diversity", style = "cont", palette = "magma") + tm_shape(cincinnati) + tm_borders(lwd = 3, col = "black") segregation_map = tm_shape(grid_attr) + tm_polygons(col = "mutinf_mean", title = "Segregation", style = "cont", palette = "cividis") + tm_shape(cincinnati) + tm_borders(lwd = 3, col = "black") tmap_arrange(diversity_map, segregation_map)
Bonus: visualizing racial landscapes
While the realizations created few steps before represents race spatial distribution fairly well, they do not take the spatial variability of the population densities into consideration. Additional function
plot_realization() displays a selected realization taking into account not only race spatial distribution, but also the population density.
plot_realization(x = realizations_raster[], y = race_raster, hex = my_pal)
In its result, darker areas have larger populations, and brighter represent areas less-inhabited areas.
The raceland package implements a computational framework for a pattern-based, zoneless analysis and visualization of (ethno)racial topography. The most comprehensive description of the method can be found in the Racial landscapes – a pattern-based, zoneless method for analysis and visualization of racial topography article published in Applied Geography. Its preprint is available at https://osf.io/preprints/socarxiv/mejz5. Additionally, raceland has three extensive vignettes:
- raceland: R package for a pattern-based, zoneless method for analysis and visualization of racial topography – introducing the package and its functions
- raceland: Describing local racial patterns of racial landscapes at different spatial scales – showing how the calculations can be performed at different spatial scales
- raceland: Describing local pattern of the racial landscape using SocScape grids – presenting how to use the raceland methods with SocScape race-specific grids to perform analysis for different spatial scales, using the Cook county as an example.
This approach is based on the concept of ‘landscape’ used in the domain of landscape ecology. To learn more about information theory metrics used in this approach you can read the Information theory as a consistent framework for quantification and classification of landscape patterns article published in Landscape Ecology.
The raceland package requires race-specific grids. They can be obtained in two main ways. The first one is to download prepared grids from the SocScape project. It provides high-resolution raster grids for 1990, 2000, 2010 years for 365 metropolitan areas and each county in the conterminous US. The second way is to rasterize a spatial vector file (e.g., an ESRI Shapefile) with an attribute table containing race counts for some areas using the
Finally, while the presented methods have been applied to race-specific raster grids, they can be also used for many other problems where it is important to determine spatial diversity and segregation.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.