# Agent Based Modelling with data.table OR how to model urban migration with R

**Data Shenanigans » R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Introduction

Recently I found a good introduction to the Shelling-Segregation Model and to Agent Based Modelling (ABM) for Python (Binpress Article by Adil). The model follows an ABM approach to simulate how urban segregation can be explained.

I will concentrate on the R-code, if you want to know more about the Shelling-Segregation Model (which brought its creator a Nobel Price) and Agent Based Modelling you should head over to the binpress page! As my Python knowledge is not sufficiently large enough, I try to rebuild the ABM in R with some guidelines from the Python script, but as I use `data.table`

and its specific functions, the code naturally deviates quite a lot.

## Shelling-Segregation Model

The idea behind the Shelling Model is that we create an `M x N`

grid that contains homes for our agents, which we simulate to belong to `n`

different races, with a certain possibility that homes are empty.

In each round I calculate a ratio of same-race neighbors over the total amount neighbors for each home (position on the grid). If the ratio falls below a certain threshold (in this case 50%), the agent becomes unhappy and will move to another (random) home. This process is iterated to find an equilibrium.

## Basic Principles of the Code

The ABM is based on three self-written functions:

`initiateShelling()`

- takes inputs of dimensions of the simulated area and the number of different races
- creates a data.table called
*shelling*that contains the*id*for each position in the grid,*x*and*y*the coordinates, the*race*as well as*distance*and*unsatisfied*, which we will use later

`plotShelling()`

- takes a text input that is used as the graph’s title
- it uses the
`shelling`

data.table and plots each agent

`iterate()`

- takes the number of iterations (= number of simulations) as well as the similarity threshold
- the function has another subfunction
`is.unsatisfied()`

that checks for each row if the agent is unsatisfied - iterate first checks for each agent if she is unsatisfied, if so the agent will be moved

## R-Code of the Functions

To fasten up the speed of the code, I use the `data.table`

package including some of its specialties. If you are unfamiliar with the syntax of `data.table`

, I recommend you to have a look at this excellent intro by yhat or the CheatSheet by DataCamp.

For visualization, I use `ggplot2`

and `RColorBrewer`

.

The packages are loaded with:

require(data.table) # for the visualization require(ggplot2) require(RColorBrewer)

The code for the `initiateShelling()`

-function looks like this:

initiateShelling <- function(dimensions = c(10, 10), n_races = 2){ # create "races" based on colours races <- colours()[1:n_races] # what percentage of the homes will be empty perc_empty <- 0.2 # how many homes will be simulated n_homes = prod(dimensions) # calculates the number of agents count_agents <- floor(n_homes * (1 - perc_empty)) # the characteristics that a home can have races <- c("empty", races) # the probabilities of each characteristics probabilities <- c(perc_empty, rep((1 - perc_empty)/(length(races) - 1), times = length(races) - 1)) # creates the global shelling data.table shelling <<- data.table(id = 1:prod(dimensions), x = rep(1:dimensions[1], dimensions[2]), y = rep(1:dimensions[2], each = dimensions[1]), race = sample(x = races, size = n_homes, prob = probabilities, replace = TRUE), # used to find the satisfaction of each home distance = rep(NA, prod(dimensions)), unsatisfied = rep(NA, prod(dimensions))) }

Secondly, the `plotShelling()`

-function looks like this:

plotShelling <- function(title = "Shelling-Segregation-Model"){ # get races to get the right number of colors races <- unique(shelling$race) # find the dimensions of the grid to get the best dot size dims <- c(max(shelling$x), max(shelling$y)) # create colours # check if there are less than 3 races, # this would create issues with brewer.pal from # RColorBrewer otherwise if (length(races) <= 3) { colors <- brewer.pal(3, "Dark2")[c language="(1,3)"][/c] } else { colors <- brewer.pal(length(races), "Dark2") } # plot the graph p <- ggplot(data = shelling[race != "empty"], aes(x = x, y = y, color = race)) + # workaround to get relatively large dots that # resize with the size of the grid geom_point(size = 100/sqrt(prod(dims))) + scale_colour_manual(values = colors) + # create a beatiful and mostly empty chart theme_bw() + theme(axis.line = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank(), legend.position = "none", panel.background = element_blank(), panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.background = element_blank(), plot.title = element_text(lineheight=3, face="bold", color="black", size=14)) + # fixes the axis to avoid distortion in the picture coord_fixed(ratio = 1) + # lastly adds the title ggtitle(title) return(p) }

And lastly, the `iterate()`

-function, which iterates over the checks for satisfaction and moves of the agents if necessary, looks like this:

iterate <- function(n = 10, similiarity_threshold){ # subfunction that checks for a given x and y value if the agent is # unsatisfied (returns TRUE or FALSE) is.unsatisfied <- function(x_value, y_value, similiarity_threshold = 0.5){ # gets the race for the agent cur_race <- shelling[x == x_value & y == y_value, race] # checks if the home is empty to if (cur_race == "empty"){ return(FALSE) # empty houses are not satisfied, therefore will not move! }else{ # creates the square of the distance # I avoid to take the squareroot to speed up the code shelling[, distance := (x_value - x)^2 + (y_value - y)^2] # counts the number of agents that live less than two fields away # (includes all adjacent agents) and that are similar count_similar <- nrow(shelling[distance <= 2 & race == cur_race & distance != 0]) # same here except that it looks into different agents count_different <- nrow(shelling[distance <= 2 & race != cur_race & race != "empty"]) # calculates the ratio ratio <- count_similar/(count_similar + count_different) # returns TRUE if the ratio is below the threshold return(ratio < similiarity_threshold) } } # creates a ProgressBar, although this is not necessary, it does look nice.. pb <- txtProgressBar(min = 0, max = 1, style = 3) # for time-keeping-purposes t <- Sys.time() # iterates for (iterate in 1:n){ # fills the boolean vector "unsatisfied" # indicates if the household is unsatisfied shelling[, unsatisfied := is.unsatisfied(x_value = x, y_value = y, similiarity_threshold = similiarity_threshold), by = 1:nrow(shelling)] # move unsatisfied agents to an empty house # find the IDs that are empty where agents can migrate to emptyIDs <- shelling[race == "empty", id] # finds the id of empty houses # finds the origin of the agents moving, # aka. the old ID of each household moving oldIDs <- shelling[(unsatisfied), id] # origin # generates new IDs for each household moving # note that households can move to the house of other moving agents # also, agents can (by a small chance) choose to "move" to their # existing home newIDs <- sample(x = c(emptyIDs, oldIDs), size = length(oldIDs), replace = F) # target # a new data.table that shows # what race migrates from which origin_id to which target-id transition <- data.table(origin = oldIDs, oldRace = shelling[id %in% oldIDs, race], target = newIDs) # moves the agents to the new homes shelling[id %in% transition$origin]$race = "empty" shelling[id %in% transition$target]$race = transition$oldRace # orders the shelling, although this takes some time, # it is necessary for the other operations shelling <- shelling[order(id)] # updates the ProgressBar setTxtProgressBar(pb, iterate/n) } close(pb) timedif <- Sys.time() - t # print out statistics for the calculation time print(paste0("Time for calculation in seconds: ", round(timedif, 3), " or: ", round(n / as.numeric(timedif), 3), " iterations per second")) return(shelling) }

## Results 1: 2 Races

By using the function I create a 50×50 grid with 2.500 agents and simulate 20 rounds (this process takes roughly a minute). A visualization is produced at 0, 10, and 20 iterations; after 20 rounds the 2-race simulation reaches an equilibrium as we can see by the few changes between the two states (10 and 20 iterations).

set.seed(42^2) # initiate shelling initiateShelling(dimensions = c(50, 50), n_races = 2) # plot shelling plotShelling(title = "Shelling-Segregation-Model after 0 iterations") # iterate 10 times shelling <- iterate(n = 10, similiarity_threshold = 0.5) # plot the result after 10 iterations plotShelling(title = "Shelling-Segregation-Model after 10 iterations") # iterate another 10 times shelling <- iterate(n = 10, similiarity_threshold = 0.5) # plot again after 20 iterations total plotShelling(title = "Shelling-Segregation-Model after 20 iterations")

Although it seems that the model found an equilibrium after 10 iterations, there are still some minor changes between the two states, albeit only few.

## Results 2: 4 Races

To see the ABM with 4 different races I used the following code to generate the following images.

set.seed(42^3) # initiate shelling initiateShelling(dimensions = c(50, 50), n_races = 4) # plot shelling plotShelling(title = "Shelling-Segregation-Model after 0 iterations") # iterate 10 times shelling <- iterate(n = 10, similiarity_threshold = 0.5) # plot the result after 10 iterations plotShelling(title = "Shelling-Segregation-Model after 10 iterations") # iterate another 10 times shelling <- iterate(n = 10, similiarity_threshold = 0.5) # plot again after 20 iterations total plotShelling(title = "Shelling-Segregation-Model after 20 iterations") # iterate another 30 times shelling <- iterate(n = 30, similiarity_threshold = 0.5) # plot again after 50 iterations total plotShelling(title = "Shelling-Segregation-Model after 50 iterations")

There are almost no visible differences between the initial state and state after 10 iterations.

A more notable change between the states after 10 and 20 iterations.

Here we see that the model took more iterations to find an almost steady-state after 50 iterations (there are still some agents that will move in the next round, can you spot them?).

## Outro

These few lines show nicely what ABMs are, how they can be used and how they simplify complex human behavior.

Although `data.table`

is enormously fast, the process still takes a lot of time to calculate. If you have any idea how to speed up the code, I would love to hear about it in the comments! If you have any other questions or remarks, I am of course happy to hear about them in the comments as well.

Lastly, if you want to see other implementations of the Shelling-Segregation Model in R, you can visit R Snippets or R-bloggers.

**leave a comment**for the author, please follow the link and comment on their blog:

**Data Shenanigans » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.