**R tutorial for Spatial Statistics**, and kindly contributed to R-bloggers)

## Preface

**gstat**.

## Introduction

### Time as the third dimension

*s*and

_{i}*s*, depends only on their separation, which we indicate with the vector

_{j}*h*in Eq.1. If we imagine a 2D example (i.e. purely spatial), the vector

*h*is simply the one that connects two points,

*i*and

*j*, with a line, and its value can be calculated with the Euclidean distance:

### Spatio-Temporal Variogram

*s*there will be a time

_{i}*t*associated with it, and to calculate the variance between this point and another we would need to calculate their spatial separation

_{i}*h*and their temporal separation

*u*. Thus, the spatio-temporal variogram can be computed as follows, from Sherman (2011):

*h*and time

*u*.

## Spatio-Temporal Kriging in R

**gstat**with a set of functions very similar to what we are used to in standard 2D kriging. The package

**spacetime**provides ways of creating objects where the time component is taken into account, and

**gstat**uses these formats for its space-time analysis. Here I will present an example of spatio-temporal kriging using sensors’ data.

### Data

### Packages

**sp**, for handling spatial objects, and

**gstat**, which has all the function to actually perform spatio-temporal kriging. Then

**spacetime**, which we need to create the spatio-temporal object. These are the three crucial packages. However, I also loaded some others that I used to complete smaller tasks. I loaded the

**raster**package, because I use the functions

`coordinates`

and `projection`

to create spatial data. There is no need of loading it, since the same functions are available under different names in **sp**. However, I prefer these two because they are easier to remember. The last packages are

**rgdal**and

**rgeos**, for performing various operations on geodata.

### Data Preparation

`POSIXlt`

or `POSIXct`

, which are standard ways of representing time in R. This very first thing we have to do is of course setting the working directory and loading the csv file:setwd("...")

data <- read.table("ozon_tram1_14102011_14012012.csv", sep=",", header=T)

*generation_time*” of our dataset:

`POSIXlt`

, but we first need to extract only the first 10 digits, and then convert it. This can be done in one line of code but with multiple functions:data$TIME <- as.POSIXlt(as.numeric(substr(paste(data$generation_time), 1, 10)), origin="1970-01-01")

`paste(data$generation_time)`

. This creates the character string shown above, which we can then subset using the function `substr`

. This function is used to subtract characters from a string and takes three arguments: a string, a starting character and a stopping character. In this case we want to basically delete the last 3 numbers from our string, so we set the start on the first number (`start=1`

), and the stop at 10 (`stop=10`

). Then we need to change the numerical string back to a numerical format, using the function `as.numeric`

. Now we just need one last function to tell R that this particular number is a Date/Time object. We can do this using the function `as.POSIXlt`

, which takes the actual number we just created plus an origin. Since we are using UNIX time, we need to set the starting point at “*1970-01-01*“. We can test this function of the first element of the vector

*data$generation_time*to test its output:

> as.POSIXlt(as.numeric(substr(paste(data$generation_time[1]), start=1, stop=10)), origin="1970-01-01")

[1] "2011-10-14 11:14:46 CEST"

`data.frame`

data has a new column named *TIME*where the Date/Time information are stored.

data$LAT <- as.numeric(substr(paste(data$latitude),1,2))+(as.numeric(substr(paste(data$latitude),3,10))/60)

data$LON <- as.numeric(substr(paste(data$longitude),1,1))+(as.numeric(substr(paste(data$longitude),2,10))/60)

`paste`

and `substr`

to extract only the numbers we need. For converting this format into degrees, we need to sum the degrees with the minutes divided by 60. So in the first part of the equation we just need to extract the first two digits of the numerical string and transform them back to numerical format. In the second part we need to extract the remaining of the strings, transform them into numbers and then divided them by 60.This operation creates some `NA`

s in the dataset, for which you will get a warning message. We do not have to worry about it as we can just exclude them with the following line: ### Subset

`data.frame`

objects using Date/Time:> sub <- data[data$TIME>=as.POSIXct('2011-12-12 00:00 CET')&data$TIME<=as.POSIXct('2011-12-14 23:00 CET'),]

> nrow(sub)

[1] 6734

*sub*into a spatial object, and then I changed its projection into UTM so that the variogram will be calculated on metres and not degrees. These are the steps required to achieve all this:

*ozone.UTM*, which is a

`SpatialPointsDataFrame`

with coordinates in metres. ### Spacetime Package

**Gstat**is able to perform spatio-temporal kriging exploiting the functionalities of the package

**spacetime**, which was developed by the same team as

**gstat**. In

**spacetime**we have two ways to represent spatio-temporal data:

`STFDF`

and `STIDF`

formats. The first represents objects with a complete space time grid. In other words in this category are included objects such as the grid of weather stations presented in Fig.1. The spatio-temporal object is created using the *n*locations of the weather stations and the

*m*time intervals of their observations. The spatio-temporal grid is of size ?∙?.

` `

`STIDF`

objects are the one we are going to use for this example. These are unstructured spatio-temporal objects, where both space and time change dynamically. For example, in this case we have data collected on top of trams moving around the city of Zurich. This means that the location of the sensors is not consistent throughout the sampling window. `STIDF`

objects is fairly simple, we just need to disassemble the `data.frame`

we have into a spatial, temporal and data components, and then merge them together to create the `STIDF`

object.`SpatialPoints`

object, with the locations of the sensors at any given time: ozoneSP <- SpatialPoints([email protected],CRS("+init=epsg:3395"))

`SpatialPoints`

in the package **sp**. This function takes two arguments, the first is a

`matrix`

or a `data.frame`

with the coordinates of each point. In this case I used the coordinates of the `SpatialPointsDataFrame`

we created before, which are provided in a `matrix`

format. Then I set the projection in UTM.At this point we need to perform a very important operation for kriging, which is check whether we have some duplicated points. It may happen sometime that there are points with identical coordinates. Kriging cannot handle this and returns an error, generally in the form of a “singular matrix”. Most of the time in which this happens the problem is related to duplicated locations. So we now have to check if we have duplicates here, using the function

`zerodist`

:dupl <- zerodist(ozoneSP)

`STIDF`

object:ozoneDF <- data.frame(PPB=ozone.UTM$ozone_ppb[-dupl[,2]])

`data.frame`

with only one column, named *PPB*, with the ozone observations in part per billion. As you can see I removed the duplicated points by excluding the rows from the object

*ozone.UTM*with the indexes included in one of the columns of the object

*dupl*. We can use the same trick while creating the temporal part:

ozoneTM <- as.POSIXct(ozone.UTM$TIME[-dupl[,2]],tz="CET")

*ozoneSP*,

*ozoneDF*and

*ozoneTM*into a

`STIDF`

:timeDF <- STIDF(ozoneSP,ozoneTM,data=ozoneDF)

`STIDF`

object by using the spatio-temporal version of the function `spplot`

, which is `stplot`

:stplot(timeDF)

### Variogram

`variogramST`

. Its use is similar to the standard function for spatial kriging, even though there are some settings for the temporal component that need to be included.`variogramST`

is identical to a normal call to the function `variogram`

; we first have the formula and then the data source. However, then we have to specify the time units (`tunits`

) or the time lags (`tlags`

). I found the documentation around this point a bit confusing to be honest. I tested various combinations of parameters and the line of code I presented is the only one that gives me what appear to be good results. I presume that what I am telling to the function is to aggregate the data to the hours, but I am not completely sure. I hope some of the readers can shed some light on this!!I must warn you that this operation takes quite a long time, so please be aware of that. I personally ran it overnight.

### Plotting the Variogram

### Variogram Modelling

`vgmST`

and `fit.StVariogram`

, which are the spatio-temporal matches for `vgm`

and `fit.variogram`

.Below I present the code I used to fit all the models. For the automatic fitting I used most of the settings suggested in the following demo:

demo(stkrige)

**gstat**we have 5 options: separable, product sum, metric, sum metric, and simple sum metric. You can find more information to fit these model, including all the equations presented below, in (Gräler et al., 2015), which is available in pdf (I put the link in the “More Information” section).

#### Separable

The first thing to set are the upper and lower limits for all the variogram parameters, which are used during the automatic fitting:

# lower and upper bounds

pars.l <- c(sill.s = 0, range.s = 10, nugget.s = 0,sill.t = 0, range.t = 1, nugget.t = 0,sill.st = 0, range.st = 10, nugget.st = 0, anis = 0)

pars.u <- c(sill.s = 200, range.s = 1000, nugget.s = 100,sill.t = 200, range.t = 60, nugget.t = 100,sill.st = 200, range.st = 1000, nugget.st = 100,anis = 700)

`vgmST`

do not make much of a difference, so probably you do not have to worry too much about the parameters you select in `vgmST`

.We can check how this model fits our data by using the function

`fit.StVariogram`

with the option `fit.method=0`

, which keeps this model but calculates its Mean Absolute Error (MSE), compared to the actual data: `extractPar`

:> extractPar(separable_Vgm)

range.s nugget.s range.t nugget.t sill

199.999323 10.000000 99.999714 1.119817 17.236256

#### Product Sum

*k*> 0.

`vgmST`

we need to provide both the spatial and temporal component, plus the value of the parameter `k`

(which needs to be positive): `k = 5`

, but R returned an error message saying that it needed to be positive, which I did not understand. However, with 50 it worked and as I mentioned the automatic fit does not care much about these initial values, probably the most important things are the upper and lower bounds we set before.We can then proceed with the fitting process and we can check the MSE with the following two lines:

#### Metric

*k*) that allows some flexibility.

*k*. In R

*k*is named

`stAni`

and creating a metric model in `vgmST`

can be done as follows:metric <- vgmST("metric", joint = vgm(50,"Mat", 500, 0), stAni=200)

The automatic fit produces the following MSE:

We can plot this model to visually check its accuracy:

#### Sum Metric

The automatic fit can be done like so:

Which creates the following model:

#### Simple Sum Metric

This returns a model similar to the sum metric:

#### Choosing the Best Model

### Prediction Grid

roads <- shapefile("VEC25_str_l_Clip/VEC25_str_l.shp")

*roads*object only the lines belonging to Klass1 streets:

Klass1 <- roads[roads$objectval=="1_Klass",]

Klass1.UTM <- spTransform(Klass1,CRS("+init=epsg:3395"))

`crop`

in **rgeos**, with the object

*ozone.UTM*that I created before:

Klass1.cropped <- crop(Klass1.UTM,ozone.UTM)

`spsample`

to create a random grid of points along the road lines:sp.grid.UTM <- spsample(Klass1.cropped,n=1500,type="random")

`RData`

format (gridST.RData): **spacetime**. We first need to create a vector of Date/Times using the function

`seq`

:tm.grid <- seq(as.POSIXct('2011-12-12 06:00 CET'),as.POSIXct('2011-12-14 09:00 CET'),length.out=5)

`length.out=5`

), with `POSIXct`

values between the two Date/Times provided. In this case we are interested in creating a spatio-temporal data frame, since we do not yet have any data for it. Therefore we can use the function `STF`

to merge spatial and temporal data into a spatio-temporal grid: grid.ST <- STF(sp.grid.UTM,tm.grid)

This can be used as new data in the kriging function.

### Kriging

`krigeST`

to perform the interpolation:pred <- krigeST(PPB~1, data=timeDF, modelList=sumMetric_Vgm, newdata=grid.ST)

We can plot the results again using the function `stplot`

:

stplot(pred)

## More information

vignette("st", package = "gstat")

demo(stkrige)

## References

Gräler, B., 2012. Different concepts of spatio-temporal kriging [WWW Document]. URL geostat-course.org/system/files/part01.pdf (accessed 8.18.15).

Gräler, B., Pebesma, Edzer, Heuvelink, G., 2015. Spatio-Temporal Interpolation using gstat.

Gräler, B., Rehr, M., Gerharz, L., Pebesma, E., 2013. Spatio-temporal analysis and interpolation of PM10 measurements in Europe for 2009.

Oliver, M., Webster, R., Gerrard, J., 1989. Geostatistics in Physical Geography. Part I: Theory. Trans. Inst. Br. Geogr., New Series 14, 259–269. doi:10.2307/622687

Sherman, M., 2011. Spatial statistics and spatio-temporal data: covariance functions and directional properties. John Wiley & Sons.

All the code snippets were created by Pretty R at inside-R.org

**leave a comment**for the author, please follow the link and comment on their blog:

**R tutorial for Spatial Statistics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...