Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

At the last Cologne R user meeting Holger Zien gave a great introduction to dynamic linear models (dlm). One special case of a dlm is the Kalman filter, which I will discuss in this post in more detail. I kind of used it earlier when I measured the temperature in my room.

Over the last week I came across the wonderful quantitative economic modelling site quant-econ.net, designed and written by Thomas J. Sargent and John Stachurski. The site not only provides access to their lecture notes, including the Kalman filer, but also code in Python and Julia.

I will take their example of the Kalman filter and go through it with R. I particularly liked their visuals of the various steps of the Kalman filter.

### Motivation

Suppose I have a little robot that moves autonomously over my desk. How would the robot know where it is? Well, suppose the robot can calculate from its wheel spin how far it went and an additional sensor (e.g. GPS-like) provides also information about its location. Of course both pieces of information will be imprecise. How can the various signals be combined?

This is a classic scenario for the Kalman filter. Its key assumptions are that the errors/noise are Gaussian and that the state space evolution xt from one time step to the next is linear, so is the mapping to the sensor signals yt. For the example I will use below it reads:

xt+1ytx1=Axt+w,wN(0,Q)=Gxt+ν,νN(0,R)N(x0,Σ0)

With A,Q,G,R,Σ matrices of appropriate dimensions. The Kalman filter provides recursive estimators for xt:

Ktx^t+1Σt+1=AΣtG(GΣtG+R)1=Axt^+Kt(yGx^)=AΣtAKtGΣtA+Q

Let’s say at time t0 the robot has the expected position x^=(0.2,0.2). That means, the robot doesn’t know its location on the table with certainty. Further I shall assume that the probability density function of the position follows a Normal distribution with covariance matrix Σ=[0.40.30.30.45]
The prior distribution of the robot’s position can be visualised in R with a contour plot.

library(mnormt)
xhat <- c(0.2, -0.2)
Sigma <- matrix(c(0.4, 0.3,
0.3, 0.45), ncol=2)
x1 <- seq(-2, 4,length=151)
x2 <- seq(-4, 2,length=151)
f <- function(x1,x2, mean=xhat, varcov=Sigma)
dmnorm(cbind(x1,x2), mean,varcov)
z <- outer(x1,x2, f)
mycols <- topo.colors(100,0.5)
image(x1,x2,z, col=mycols, main="Prior density",
xlab=expression('x'), ylab=expression('x'))
points(0.2, -0.2, pch=19)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1)

### Sensor information

The ‘GPS’ sensor provides also data about the location of the robot, again the sensor signal can be regarded as the expected position with some noise. Suppose the reading of the sensor is y=(2.3,1.9). I will assume that the sensor signal has an error following a Normal distribution, with a covariance matrix R=0.5Σ.

Conceptually I think about it like this, given that the robot is at position x, what would be the likelihood that the sensor reading is y|x? The Kalman filter assumes a linear mapping from x to y

y=Gx+ν with νN(0,R)

In my case G will be the identity matrix. Let’s add the sensor information to the plot.

R <- 0.5 * Sigma
z2 <- outer(x1,x2, f, mean=c(2.3, -1.9), varcov=R)
image(x1, x2, z2, col=mycols, main="Sensor density")
points(2.3, -1.9, pch=19)
text(2.2, -1.9, labels = "y", adj = 1)
points(0.2, -0.2, pch=19)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1)

### Filtering

I have two pieces of information about the location of the robot, both with a Normal distribution, my prior is xN(x^,Σ) and for the distribution of my sensor I have y|xN(Gx,R)

What I would like to know is the location of the robot given the sensor reading:

p(x|y)=p(y|x)p(x)p(y)

Now, because x,y are Normal and G is the identity matrix, I know that the posterior distribution of x|y is Normal again, with parameters (see conjugate prior):

x^f=(Σ1+R1)1(Σ1x^+R1y)Σf=(Σ1+R1)1

Using the matrix inversion identity

(A1+B1)1=AA(A+B)1A=A(A+B)1B

I can write the above as:

x^fΣf=(ΣΣ(Σ+R)1Σ)(Σ1x^+R1y)=x^Σ(Σ+R)1x^+ΣR1yΣ(Σ+R)1ΣR1y=x^+Σ(Σ+R)1(yx^)=(1.667,1.333)=ΣΣ(Σ+R)1Σ=[0.1330.1000.100.15]

In the more general case when G is not the identity matrix I have

x^fΣf=x^+ΣG(GΣG+R)1(yx^)=ΣΣG(GΣG+R)1ΣG

The updated posterior probability density function p(x|y)=N(x^f,Σf) is visualised in the chart below.

I can see how the prior and likelihood distributions have influenced the posterior distribution of the robot’s location. The sensor information is regarded more credible than the prior information and hence the posterior is closer to the sensor data.

G = diag(2)
y <- c(2.4, -1.9)
xhatf <- xhat + Sigma %*% t(G) %*% solve(G %*% Sigma %*% t(G) + R) %*% (y - G %*% xhat)
Sigmaf <- Sigma - Sigma %*% t(G) %*% solve(G %*% Sigma %*% t(G) + R) %*% G %*% Sigma
z3 <- outer(x1, x2, f, mean=c(xhatf), varcov=Sigmaf)
image(x1, x2, z3, col=mycols,
xlab=expression('x'), ylab=expression('x'),
main="Filtered density")
points(xhatf, xhatf, pch=19)
text(xhatf-0.1, xhatf,
labels = expression(hat(x)[f]), adj = 1)
points(0.2, -0.2, pch=19, col=lb)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1, col=lb)
points(2.3, -1.9, pch=19, col=lb)
text(2.2, -1.9,labels = "y", adj = 1, col=lb)

### Forecasting

Ok, I have a better understanding of the robot’s position at time t0, but the robot is moving and what I’d like to know is where the it will be after one time step.

Suppose I have a linear model that explains how the state evolves over time, e.g. based on wheel spin. Again, I will assume a Gaussian process. In this toy example I follow the assumptions of Sargent and Stachurski, although this doesn’t make much sense for the robot:

xt+1=Axt+wt+1, where wtN(0,Q)

with

A=(1.20.00.00.2),Q=0.3Σ

As all the processes are linear and Normal the calculations are straightforward:

E[Axf+w]=AExf+Ew=Ax^f=Ax^+AΣG(GΣG+R)1(yGx^)
Var[Axf+w]=AVar[xf]A+Q=AΣfA+Q=AΣAAΣG(GΣG+R)1GΣA+Q

The matrix AΣG(GΣG+R)1 is often written as KΣ and called the Kalman gain. Using this notation, I can summarise the results as follows:

x^newΣnew:=Ax^+KΣ(yGx^):=AΣAKΣGΣA+Q

Adding the prediction to my chart gives me:

A <- matrix(c(1.2, 0,
0, -0.2), ncol=2)
Q <- 0.3 * Sigma
K <- A %*% Sigma %*% t(G) %*% solve(G%*% Sigma %*% t(G) + R)
xhatnew <- A %*% xhat + K %*% (y - G %*% xhat)
Sigmanew <- A %*% Sigma %*% t(A) - K %*% G %*% Sigma %*% t(A) + Q
z4 <- outer(x1,x2, f, mean=c(xhatnew), varcov=Sigmanew)
image(x1, x2, z4, col=mycols,
xlab=expression('x'), ylab=expression('x'),
main="Predictive density")
points(xhatnew, xhatnew, pch=19)
text(xhatnew-0.1, xhatnew,
labels = expression(hat(x)[new]), adj = 1)
points(xhatf, xhatf, pch=19, col=lb)
text(xhatf-0.1, xhatf, col=lb,
labels = expression(hat(x)[f]), adj = 1)
points(0.2, -0.2, pch=19, col=lb)
text(0.1, -0.2, labels = expression(hat(x)), adj = 1, col=lb)
points(2.3, -1.9, pch=19, col=lb)
text(2.2, -1.9,labels = "y", adj = 1, col=lb)

That’s it, having gone through the updating process for one time step gives me the recursive Kalman filter iteration mentioned above.

Another way to visualise the four steps can be achieved with the lattice package:

library(lattice)
grid <- expand.grid(x=x1,y=x2)
grid$Prior <- as.vector(z) grid$Likelihood <- as.vector(z2)
grid$Posterior <- as.vector(z3) grid$Predictive <- as.vector(z4)
contourplot(Prior + Likelihood + Posterior + Predictive ~ x*y,
data=grid, col.regions=mycols, region=TRUE,
as.table=TRUE,
xlab=expression(x),
ylab=expression(x),
main="Kalman Filter",
panel=function(x,y,...){
panel.grid(h=-1, v=-1)
panel.contourplot(x,y,...)
})

You can find the code of this post on Github.

### Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
 en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
 stats graphics grDevices utils datasets methods base

other attached packages:
 lattice_0.20-29 mnormt_1.5-1

loaded via a namespace (and not attached):
 grid_3.1.2  tools_3.1.2