Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post I’ll outline why and how I use arrays as the main data structure in a population simulation which represents the age, sex and spatial location of tsetse flies in the landscape. An earlier post outlines the background to this tsetse population simulation.

I wanted a data structure that would make it easy and transparent for me to access elements with minimal code. I considered data frames, matrices and lists but opted for arrays. Hadley Wickham’s excellent Advanced R Data Structures section was very helpful.

### Arrays

An `array` in R is a multi-dimensional object, and a `matrix` is a special case of an `array` with just 2 dimensions. The code below shows how you can use the `dim` argument to the `array` function to set the number and size of dimensions. In this example I just fill the array with sequential values from 1 to 24.

```array(c(1:24), dim=c(4,3,2))
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]   13   17   21
## [2,]   14   18   22
## [3,]   15   19   23
## [4,]   16   20   24```

### Naming array dimensions

To reduce the risk of bugs caused by accessing incorrect array elements you can name dimensions and use the names to access elements. You can use the `dimnames` argument to name both the elements within each dimension (e.g. F,M) and the dimension itself (e.g. sex). One way of doing this is to set `dimnames` to a named list. The code below shows how I create an array with spatial, sex and age dimensions.

```nY <- 4
nX <- 3
iMaxAge <- 2
sex <- c("F","M")
dimnames1 <- list( y=paste0('y',1:nY), x=paste0('x',1:nX), sex=sex, age=paste0('age',1:iMaxAge))
nElements <-  nY*nX*iMaxAge*length(sex)
aGrid <- array(1:nElements, dim=c(nY,nX,2,iMaxAge), dimnames=dimnames1)
aGrid
## , , sex = F, age = age1
##
##     x
## y    x1 x2 x3
##   y1  1  5  9
##   y2  2  6 10
##   y3  3  7 11
##   y4  4  8 12
##
## , , sex = M, age = age1
##
##     x
## y    x1 x2 x3
##   y1 13 17 21
##   y2 14 18 22
##   y3 15 19 23
##   y4 16 20 24
##
## , , sex = F, age = age2
##
##     x
## y    x1 x2 x3
##   y1 25 29 33
##   y2 26 30 34
##   y3 27 31 35
##   y4 28 32 36
##
## , , sex = M, age = age2
##
##     x
## y    x1 x2 x3
##   y1 37 41 45
##   y2 38 42 46
##   y3 39 43 47
##   y4 40 44 48```

The example above has just a few cells and ages. In our tsetse simulation we’re often looking at 120 age categories (days) on a 50x50 grid. Here’s a function to aid creating such arrays.

### Spatial dimension trickiness

I had to be careful specifying the spatial y & x dimensions as they do not always go in the order that you might expect (and your expectations may vary depending on whether your background is more geographical or statistical !). By specifying y rather than x as the first dimension the R console displays the array in the correct orientation with x on the horizontal and y on the vertical. I started out by having x,y but kept having to transpose so decided to bite the bullet a few months in and change everything to y,x. Note that the y dimension elements start from the top which can cause other geographical issues later.

### Accessing array dimensions

This array structure allows relatively transparent access to elements and summaries as shown below.

An age structure for one grid cell

```aGrid['y1','x1','M',]
## age1 age2
##   13   37```

Total Males in one grid cell

```sum(aGrid['y1','x1','M',])
## [1] 50```

Total population in one grid cell

```sum(aGrid['y1','x1',,]) #
## [1] 76```

A spatial grid for one age

```aGrid[,,'M','age2']
##     x
## y    x1 x2 x3
##   y1 37 41 45
##   y2 38 42 46
##   y3 39 43 47
##   y4 40 44 48```

A spatial grid of total population

```apply(aGrid,MARGIN=c('y','x'),sum)
##     x
## y    x1  x2  x3
##   y1 76  92 108
##   y2 80  96 112
##   y3 84 100 116
##   y4 88 104 120```

Summed age structure for the whole population

```apply(aGrid,MARGIN=c('age'),sum)
## age1 age2
##  300  876```

Summed sex ratio for thewhole population

```apply(aGrid,MARGIN=c('sex'),sum)
##   F   M
## 444 732```

This array also allows me to save all the population data for a simulation of a number of days by simply adding an extra dimension called day using the `abind` function from the package of the same name. Unfortunately it loses the names of the dimensions but these can be reset. The code below adds the first day as a new dimension using the `along=0` argument to abind.

```library(abind)
aRecord <- abind::abind(aGrid, along=0)
# replace lost dimension names
names(dimnames(aRecord)) <- c('day','y','x','sex','age')```

Records for later days (after aGrid has changed) can be added to the first dimension using the `along=1` argument :

```aRecord <- abind::abind(aRecord, aGrid, along=1)
# replace lost dimension names
names(dimnames(aRecord)) <- c('day','y','x','sex','age')```

I have now written the code to get data from the [day,y,x,sex,age] array into a helper function that I may describe in a later post.

Also in later posts I’ll show how I represent other population processes such as movement using these arrays.