Agent Based Modelling with data.table OR how to model urban migration with R

[This article was first published on Data Shenanigans » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Recently I found a good introduction to the Shelling-Segregation Model and to Agent Based Modelling (ABM) for Python (Binpress Article by Adil). The model follows an ABM approach to simulate how urban segregation can be explained.

I will concentrate on the R-code, if you want to know more about the Shelling-Segregation Model (which brought its creator a Nobel Price) and Agent Based Modelling you should head over to the binpress page! As my Python knowledge is not sufficiently large enough, I try to rebuild the ABM in R with some guidelines from the Python script, but as I use data.table and its specific functions, the code naturally deviates quite a lot.

Shelling-Segregation Model

The idea behind the Shelling Model is that we create an M x N grid that contains homes for our agents, which we simulate to belong to n different races, with a certain possibility that homes are empty.

In each round I calculate a ratio of same-race neighbors over the total amount neighbors for each home (position on the grid). If the ratio falls below a certain threshold (in this case 50%), the agent becomes unhappy and will move to another (random) home. This process is iterated to find an equilibrium.

Basic Principles of the Code

The ABM is based on three self-written functions:

  1. initiateShelling()
    • takes inputs of dimensions of the simulated area and the number of different races
    • creates a data.table called shelling that contains the id for each position in the grid, x and y the coordinates, the race as well as distance and unsatisfied, which we will use later
  2. plotShelling()
    • takes a text input that is used as the graph’s title
    • it uses the shelling data.table and plots each agent
  3. iterate()
    • takes the number of iterations (= number of simulations) as well as the similarity threshold
    • the function has another subfunction is.unsatisfied() that checks for each row if the agent is unsatisfied
    • iterate first checks for each agent if she is unsatisfied, if so the agent will be moved

R-Code of the Functions

To fasten up the speed of the code, I use the data.table package including some of its specialties. If you are unfamiliar with the syntax of data.table, I recommend you to have a look at this excellent intro by yhat or the CheatSheet by DataCamp.

For visualization, I use ggplot2 and RColorBrewer.

The packages are loaded with:

require(data.table)
# for the visualization
require(ggplot2)
require(RColorBrewer)

The code for the initiateShelling()-function looks like this:

initiateShelling <- function(dimensions = c(10, 10), n_races = 2){
  # create "races" based on colours
  races <- colours()[1:n_races]
  # what percentage of the homes will be empty
  perc_empty <- 0.2
  
  # how many homes will be simulated
  n_homes = prod(dimensions)
  # calculates the number of agents
  count_agents <- floor(n_homes * (1 - perc_empty))
  
  # the characteristics that a home can have
  races <- c("empty", races)
  # the probabilities of each characteristics
  probabilities <- c(perc_empty, rep((1 - perc_empty)/(length(races) - 1), 
                                     times = length(races) - 1))
  
  # creates the global shelling data.table
  shelling <<- data.table(id = 1:prod(dimensions),
                          x = rep(1:dimensions[1], 
                                  dimensions[2]),
                          y = rep(1:dimensions[2], 
                                  each = dimensions[1]),
                          race = sample(x = races, 
                                        size = n_homes, 
                                        prob = probabilities,
                                        replace = TRUE),
                          # used to find the satisfaction of each home
                          distance = rep(NA, prod(dimensions)),
                          unsatisfied = rep(NA, prod(dimensions)))
}

Secondly, the plotShelling()-function looks like this:

plotShelling <- function(title = "Shelling-Segregation-Model"){
  # get races to get the right number of colors
  races <- unique(shelling$race)
  
  # find the dimensions of the grid to get the best dot size
  dims <- c(max(shelling$x), max(shelling$y))
  
  # create colours
  # check if there are less than 3 races, 
  # this would create issues with brewer.pal from 
  # RColorBrewer otherwise
  if (length(races) <= 3) {
    colors <- brewer.pal(3, "Dark2")[c language="(1,3)"][/c]
  } else {
    colors <- brewer.pal(length(races), "Dark2")
  }
  
  # plot the graph  
  p <- ggplot(data = shelling[race != "empty"], 
              aes(x = x, y = y, color = race)) + 
    # workaround to get relatively large dots that 
    # resize with the size of the grid
    geom_point(size = 100/sqrt(prod(dims))) +  
    scale_colour_manual(values = colors) + 
    # create a beatiful and mostly empty chart
    theme_bw() + 
    theme(axis.line = element_blank(),
          axis.text.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          legend.position = "none",
          panel.background = element_blank(),
          panel.border = element_blank(),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          plot.background = element_blank(),
          plot.title = element_text(lineheight=3, 
                                    face="bold", 
                                    color="black", 
                                    size=14)) +
    # fixes the axis to avoid distortion in the picture
    coord_fixed(ratio = 1) + 
    # lastly adds the title
    ggtitle(title)
  
  return(p)
}

And lastly, the iterate()-function, which iterates over the checks for satisfaction and moves of the agents if necessary, looks like this:

iterate <- function(n = 10, similiarity_threshold){
  # subfunction that checks for a given x and y value if the agent is 
  # unsatisfied (returns TRUE or FALSE)
  is.unsatisfied <- function(x_value, y_value, similiarity_threshold = 0.5){
    # gets the race for the agent
    cur_race <- shelling[x == x_value & y == y_value, race]
    
    # checks if the home is empty to
    if (cur_race == "empty"){
      return(FALSE) # empty houses are not satisfied, therefore will not move!
    }else{
      # creates the square of the distance
      # I avoid to take the squareroot to speed up the code
      shelling[, distance := (x_value - x)^2 + (y_value - y)^2] 
      
      # counts the number of agents that live less than two fields away 
      # (includes all adjacent agents) and that are similar
      count_similar <- nrow(shelling[distance <= 2 & 
                                       race == cur_race & 
                                       distance != 0])
      # same here except that it looks into different agents
      count_different <- nrow(shelling[distance <= 2 & 
                                         race != cur_race & 
                                         race != "empty"])
      
      # calculates the ratio
      ratio <- count_similar/(count_similar + count_different)
      
      # returns TRUE if the ratio is below the threshold
      return(ratio < similiarity_threshold) 
    }
  }
  
  # creates a ProgressBar, although this is not necessary, it does look nice..
  pb <- txtProgressBar(min = 0, max = 1, style = 3)
  # for time-keeping-purposes
  t <- Sys.time()
  
  # iterates
  for (iterate in 1:n){
    # fills the boolean vector "unsatisfied" 
    # indicates if the household is unsatisfied 
    shelling[, unsatisfied := is.unsatisfied(x_value = x, 
                                             y_value = y, 
                                            similiarity_threshold = 
                                                 similiarity_threshold),
             by = 1:nrow(shelling)]
    
    # move unsatisfied agents to an empty house
    # find the IDs that are empty where agents can migrate to
    emptyIDs <- shelling[race == "empty", id] # finds the id of empty houses
    
    # finds the origin of the agents moving, 
    # aka. the old ID of each household moving
    oldIDs <- shelling[(unsatisfied), id] # origin
    
    # generates new IDs for each household moving
    # note that households can move to the house of other moving agents
    # also, agents can (by a small chance) choose to "move" to their 
    # existing home
    newIDs <- sample(x = c(emptyIDs, oldIDs), 
                     size = length(oldIDs), 
                     replace = F) # target
    
    # a new data.table that shows 
    # what race migrates from which origin_id to which target-id
    transition <- data.table(origin = oldIDs, 
                             oldRace = shelling[id %in% oldIDs, race],
                             target = newIDs)
    
    # moves the agents to the new homes
    shelling[id %in% transition$origin]$race = "empty" 
    shelling[id %in% transition$target]$race = transition$oldRace
    
    # orders the shelling, although this takes some time, 
    # it is necessary for the other operations
    shelling <- shelling[order(id)]
    
    # updates the ProgressBar
    setTxtProgressBar(pb, iterate/n)
  }
  close(pb)
  timedif <- Sys.time() - t
  
  # print out statistics for the calculation time
  print(paste0("Time for calculation in seconds: ", round(timedif, 3), " or: ",
               round(n / as.numeric(timedif), 3), " iterations per second"))
  
  return(shelling)
}

Results 1: 2 Races

By using the function I create a 50×50 grid with 2.500 agents and simulate 20 rounds (this process takes roughly a minute). A visualization is produced at 0, 10, and 20 iterations; after 20 rounds the 2-race simulation reaches an equilibrium as we can see by the few changes between the two states (10 and 20 iterations).


set.seed(42^2)
# initiate shelling
initiateShelling(dimensions = c(50, 50), n_races = 2)
# plot shelling
plotShelling(title = "Shelling-Segregation-Model after 0 iterations")

# iterate 10 times
shelling <- iterate(n = 10, similiarity_threshold = 0.5)
# plot the result after 10 iterations
plotShelling(title = "Shelling-Segregation-Model after 10 iterations")

# iterate another 10 times
shelling <- iterate(n = 10, similiarity_threshold = 0.5)
# plot again after 20 iterations total
plotShelling(title = "Shelling-Segregation-Model after 20 iterations")
Initial State of ABM with 2 Races
Initial State ABM with 2 Races
ABM with 2 Races after 10 Iterations
ABM with 2 Races after 10 Iterations
ABM with 2 Races after 20 Iterations
ABM with 2 Races after 20 Iterations

Although it seems that the model found an equilibrium after 10 iterations, there are still some minor changes between the two states, albeit only few.

Results 2: 4 Races

To see the ABM with 4 different races I used the following code to generate the following images.

set.seed(42^3)
# initiate shelling
initiateShelling(dimensions = c(50, 50), n_races = 4)
# plot shelling
plotShelling(title = "Shelling-Segregation-Model after 0 iterations")

# iterate 10 times
shelling <- iterate(n = 10, similiarity_threshold = 0.5)
# plot the result after 10 iterations
plotShelling(title = "Shelling-Segregation-Model after 10 iterations")

# iterate another 10 times
shelling <- iterate(n = 10, similiarity_threshold = 0.5)
# plot again after 20 iterations total
plotShelling(title = "Shelling-Segregation-Model after 20 iterations")

# iterate another 30 times
shelling <- iterate(n = 30, similiarity_threshold = 0.5)
# plot again after 50 iterations total
plotShelling(title = "Shelling-Segregation-Model after 50 iterations")
ABM with 4 Races Initial State
ABM with 4 Races Initial State
ABM with 4 Races after 10 Iterations
ABM with 4 Races after 10 Iterations

There are almost no visible differences between the initial state and state after 10 iterations.

ABM with 4 Races after 20 Iterations
ABM with 4 Races after 20 Iterations

A more notable change between the states after 10 and 20 iterations.

ABM with 4 Races after 50 Iterations
ABM with 4 Races after 50 Iterations

Here we see that the model took more iterations to find an almost steady-state after 50 iterations (there are still some agents that will move in the next round, can you spot them?).

Outro

These few lines show nicely what ABMs are, how they can be used and how they simplify complex human behavior.

Although data.table is enormously fast, the process still takes a lot of time to calculate. If you have any idea how to speed up the code, I would love to hear about it in the comments! If you have any other questions or remarks, I am of course happy to hear about them in the comments as well.

Lastly, if you want to see other implementations of the Shelling-Segregation Model in R, you can visit R Snippets or R-bloggers.


To leave a comment for the author, please follow the link and comment on their blog: Data Shenanigans » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)