Generating a Markov chain vs. computing the transition matrix

Posted on May 23, 2013 by arthur charpentier in Uncategorized | 0 Comments

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A couple of days ago, we had a quick chat on Karl Broman‘s blog, about snakes and ladders (see http://kbroman.wordpress.com/…) with Karl and Corey (see http://bayesianbiologist.com/….), and the use of Markov Chain. I do believe that this application is truly awesome: the example is understandable by anyone, and computations (almost any kind, from what we’ve tried) are easy to perform. At the same time, some French students asked me specific details regarding some old lectures notes on Markov chains, and on some introductory example I used as a possible motivation: the stepping stone algorithm. In the notes, I just mentioned the idea of this popular generic algorithm (introduced in Sawyer (1976)) and I use simulations to show – visually – how it works. Again, it was just to motivate the course which actually did focus on the theory of Markov Chains. But those student wanted more, like how did I get the transition matrix, for instance. And that is actually not a simple question, from a computational perspective. I mean, I can easily generate this Markov Chain, but writing explicitly the transition, that was another story. Which took me a bit longer. In a very specific case…

But let us get back to the roots, and to the stepping stone algorithm. At least, one of them (the one I used in my notes) because it looks like there are several algorithm. We do consider a grid, say $h\times h$ , with some colors inside, say $k$ possible colors. Each cell of the grid has a given color. Then, at some stage, we select randomly one cell in the grid, and it will take the color of one of its neighbor (some kind of absorption, or mutation). This is, more or less, what is also detailed in some lecture notes by James Propp (see also e Sato (1983) or Zähle et al. (2005) for more theoretical details about that Markov chain). This is extremely simple to generate (that’s what I did in my notes, with very big grids, and a lot of colors). But what if we want to write the transition matrix ?

First of all, we need to define the state space. Basically, we do have $h^2$ cells, each of them has one color, chosen among $k$ . Which gives us $k^{h^2}$ possible states…. And that can be large. I mean, if we consider the smallest possible grid (that might be interesting), say $h=3$ , and only $2$ colors, then we talk about $2^9=512$ possible states. That is large, not huge. But we should keep in mind that we have to compute a transition matrix, that would be a matrix with $512\times 512=262,144$ elements. More generally, we talk about writing down matrices with $k^{2\cdot h^2}$ elements. If we want black and white $4\times 4$ grids, that would mean a matrix with $2^{2\cdot 4^2}=2^{32}=4,294,967,296$ which mean 4 billion elements ! And if we consider an red-green-blue $3\times 3$ grid, we have to explicit a matrix with $387,420,489$ i.e almost 400 million elements. So, let’s face it: we can only work with $3\times 3$ bi-color grids.

So let’s try… The good thing is that it can be related to work I’ve been doing recently on binomial recombining trees (binomial being related to bi-color). First of all, our grid will be describes as follows

> h=3
> M=matrix(1:(h^2),h,h)
> M
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

with two colors

> color=c("red","blue")

Then, we should look for neighbors, or derive an neighborhood matrix,

> d=function(i,j) dist(rbind(c((i-1)%/%h,(i-1)%%h),
+                            c((j-1)%/%h,(j-1)%%h)))
> Neighb=matrix(Vectorize(d)(rep(1:(h^2),each=h^2),
+                            rep(1:(h^2),h^2)),h^2,h^2)
> trunc(Neighb*100)/100
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,] 0.00 1.00 2.00 1.00 1.41 2.23 2.00 2.23 2.82
 [2,] 1.00 0.00 1.00 1.41 1.00 1.41 2.23 2.00 2.23
 [3,] 2.00 1.00 0.00 2.23 1.41 1.00 2.82 2.23 2.00
 [4,] 1.00 1.41 2.23 0.00 1.00 2.00 1.00 1.41 2.23
 [5,] 1.41 1.00 1.41 1.00 0.00 1.00 1.41 1.00 1.41
 [6,] 2.23 1.41 1.00 2.00 1.00 0.00 2.23 1.41 1.00
 [7,] 2.00 2.23 2.82 1.00 1.41 2.23 0.00 1.00 2.00
 [8,] 2.23 2.00 2.23 1.41 1.00 1.41 1.00 0.00 1.00
 [9,] 2.82 2.23 2.00 2.23 1.41 1.00 2.00 1.00 0.00
> Neighb=(Neighb<2)&(Neighb>0)
> Neighb
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]
 [1,] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
 [2,]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE
 [3,] FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
 [4,]  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE
 [5,]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
 [6,] FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE
 [7,] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE
 [8,] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
 [9,] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE

Now, let us explicit our 512 possible states.

> n=h^2
> states=function(x){
+   Base.b=rep(0,n)
+   ndigits=(floor(logb(x,base=length(color)))+1)
+   for(i in 1:ndigits){
+     Base.b[n-i+1]=(x%%length(color))
+     x=(x %/% length(color))}
+   return(Base.b)}
> M=Vectorize(states)(1:(length(color)^n-1))
> liststates=data.frame(rbind(rep(0,h^2),t(M)))
> head(liststates)
  X1 X2 X3 X4 X5 X6 X7 X8 X9
1  0  0  0  0  0  0  0  0  0
2  0  0  0  0  0  0  0  0  1
3  0  0  0  0  0  0  0  1  0
4  0  0  0  0  0  0  0  1  1
5  0  0  0  0  0  0  1  0  0
6  0  0  0  0  0  0  1  0  1

(for the first six, with 0/1 digits instead of colors). For instance, if we look at a specific one, it is possible to plot the grid, using

> plotsteps=function(u){
+   plot(0:h,0:h,col="white",xlab="",ylab="",axes=FALSE)
+   for(i in 0:(h^2-1)){
+   x=i%/%h
+   y=i%%h
+   polygon(x+c(1,.1,.1,1),y+c(1,1,.1,.1),
+   col=color[as.numeric(u)[i+1] + 1])
+   text(x+.45,y+.45,i)
+   }}

Here,

> plotsteps(liststates[100,])

Then, given one state, let us see what could happen next,

let us compute all connected states: all states where we can end up in if we change one cell
we have to check, for each connect state which cell did change
we should compute probabilities to reach those 9 states, based on the fact that each of the cell is chosen with the same probability, and the fact that probability to change the color is based on the colors around.
if some states cannot be reached (if a cell is surrounded by elements of the same color, so it cannot change its color), then, we should remove then from the list of reachable (possible) states.

The code will be something like the following

> listneighbour=function(i){
+   start=liststates[i,]
+   difference2only=function(j) {
+     w=which(liststates[j,]!=liststates[i,])
+     return((length(w)==1))}
+   possible=which( Vectorize(difference2only)(1:nrow(liststates))==TRUE )
+   P=function(j){   
+     L=liststates[i,which(Neighb[which(liststates[j,]!=liststates[i,]),]==TRUE)]
+     T=table(as.numeric(L))
+     T=T[as.character(0:(length(color)-1))]
+     T[is.na(T)]=0
+     return(as.numeric(T)/sum(T))
+   }
+   probability=Vectorize(P)(possible)
+   W=NULL
+   for(j in possible) W=c(W,which(liststates[j,]!=liststates[i,]))
+   I=1-liststates[i,W]+1
+   vp=diag(probability[as.numeric(I),])
+   vproba=0*vp
+   if(sum(vp)!=0) vproba=vp/sum(vp)
+   return(list(
+     color=liststates[i,W],
+     absorb=W,
+     possible=possible,
+     probability=probability,
+     prob=vproba))
+ }

For instance, if we start from state 100 (here, on the right)

> listneighbour(100)
$color
    X3 X4 X8 X9 X7 X6 X5 X2 X1
100  1  1  1  1  0  0  0  0  0

$absorb
[1] 3 4 8 9 7 6 5 2 1

$possible
[1]  36  68  98  99 104 108 116 228 356

$probability
     [,1] [,2] [,3]   [,4]   [,5] [,6] [,7] [,8]   [,9]
[1,]    1  0.8  0.6 0.6667 0.3333  0.4  0.5  0.6 0.6667
[2,]    0  0.2  0.4 0.3333 0.6667  0.6  0.5  0.4 0.3333

$prob
[1] 0.17964072 0.14371257 0.10778443 0.11976048 0.11976048
[6] 0.10778443 0.08982036 0.07185629 0.05988024

Let us look more specificaly at the 99th state (which appears above as a state that could be reached from the 100th),

> liststates[99,]
   X1 X2 X3 X4 X5 X6 X7 X8 X9
99  0  0  1  1  0  0  0  1  0

If we plot it (here on the right, again), we get

> plotsteps(liststates[99,])

Clearly, here, the cell in the upper corner (number 9) changed from blue to red. Now, about the probability… The probability to select cell 9 is 1/9, and given that cell 9 is chosen, the probability to go from blue to red is 2/3 (the cell is surrounded by 2 red cells, and 1 blue cell). The probability to remain blue is then 1/3. Those are the probabilities computed by our function (the table with two rows, one per color). In order to get a better understanding on the meaning of the last line, with some sort of probabilities), let us look at the following (simpler) example.

> liststates[2,]
  X1 X2 X3 X4 X5 X6 X7 X8 X9
2  0  0  0  0  0  0  0  0  1

that can be visualized on the right (on the right). Here,

> listneighbour(2)
$color
  X9 X8 X7 X6 X5 X4 X3 X2 X1
2  1  0  0  0  0  0  0  0  0

$absorb
[1] 9 8 7 6 5 4 3 2 1

$possible
[1]   1   4   6  10  18  34  66 130 258

$probability
     [,1] [,2] [,3] [,4]  [,5] [,6] [,7] [,8] [,9]
[1,]    1  0.8    1  0.8 0.875    1    1    1    1
[2,]    0  0.2    0  0.2 0.125    0    0    0    0

$prob
[1] 0.65573770 0.13114754 0.00000000 0.13114754 0.08196721 
[6] 0.00000000 0.00000000 0.00000000 0.00000000

Things are pretty simple here

if we chose cells $http://latex.codecogs.com/gif.latex%20?\{1,2,3,4,7\}$ , then nothing change, since all the neighbors have the same color. So if we want to focus on changes (or say run the algorithm until the first color change, then choosing those cells is a waste of time)
if we chose cells $http://latex.codecogs.com/gif.latex%20?\{5,6,8\}$ , then it could be possible to change the color. And actually, $http://latex.codecogs.com/gif.latex%20?\{5\}$ is different from $http://latex.codecogs.com/gif.latex%20?\{6,8\}$ (since it does have much more neighbors)
if we chose cell $http://latex.codecogs.com/gif.latex%20?\{9\}$ , then definitively, the color will change, since all neighbors have the other color here,

Now, the probability to select cell $k$ given that there was a color change would be, if $k$ is in $http://latex.codecogs.com/gif.latex%20?\{9\}$

$http://latex.codecogs.com/gif.latex%20?\mathbb{P}(k)\propto%20\frac{3}{3}=1$

while if $k$ is in $http://latex.codecogs.com/gif.latex%20?\{6,8\}$ , then there are 4 out 5 neighbors that are red, so

$http://latex.codecogs.com/gif.latex%20?\mathbb{P}(k)\propto%20\frac{1}{5}$ and if $k$ is $http://latex.codecogs.com/gif.latex%20?\{5\}$ , then, only one neighbor has a different color, out of 8, so

$http://latex.codecogs.com/gif.latex%20?\mathbb{P}(k)\propto%20\frac{1}{8}$

And for the other, $http://latex.codecogs.com/gif.latex%20?\mathbb{P}(k)\propto%200$ . So, it comes – since we assume that cells are drawn independently, and with the same probability, if $k$ is in $http://latex.codecogs.com/gif.latex%20?\{9\}$

$http://latex.codecogs.com/gif.latex%20?\mathbb{P}(k)=%20\frac{1%20\cdot%20\frac{1}{9}}{\left(1+2\times%20\frac{1}{5}+%20\frac{1}{8}+5\times%200\right)\cdot%20\frac{1}{9}}=\frac{40}{61}$

while if $k$ is in $http://latex.codecogs.com/gif.latex%20?\{6,8\}$ , then there are 4 out 5 neighbors that are red, so

$http://latex.codecogs.com/gif.latex%20?\mathbb{P}(k)=%20\frac{\frac{1}{5}%20\cdot%20\frac{1}{9}}{\left(1+2\times%20\frac{1}{5}+%20\frac{1}{8}+5\times%200\right)\cdot%20\frac{1}{9}}=\frac{8}{61}$

and if $k$ is $http://latex.codecogs.com/gif.latex%20?\{5\}$ , then, only one neighbor has a different color, out of 8, so

$http://latex.codecogs.com/gif.latex%20?\mathbb{P}(k)=%20\frac{\frac{1}{8}%20\cdot%20\frac{1}{9}}{\left(1+2\times%20\frac{1}{5}+%20\frac{1}{8}+5\times%200\right)\cdot%20\frac{1}{9}}=\frac{5}{61}$

Which are exactly the probability computed above. The point is that we compute probabilities given that a color change will actually occur. The good point is that it should faster convergence to some limiting distribution. If any.

What about our transition matrix ? Well, using a simply loop, we should get it easily

> M=matrix(0,nrow(liststates),nrow(liststates))
+ for(i in 1:nrow(liststates)){
+ L=listneighbour(i)
+ if(sum(L$prob)!=0){
+ j=L$possible
+ M[i,j]=L$prob
+ }
+ if(sum(L$prob)==0){
+ j=i
+ M[i,j]=1
+ }
+ }

One can check that this matrix satisfies some properties of transition matrices. For instance, the sum per row is one,

> sum(apply(M,1,sum)!=1)
[1]  0

Remember that this matrix is big, so I will not print if here. But trust me, it works (it might take a while on an old laptop, but anyone can do it). Now, if we want to visualize some paths of that chain, we can use the following algorithm. First, we need a starting point, that can be chosen randomly,

> j=sample(1:nrow(liststates),size=1)

or using a given colored grid, say

> j=100

Then we plot it,

> plotsteps(liststates[j,])

Now, the code within the loop is here

> d=rep(0,nrow(liststates))
> d[j]=1
> d=d%*%M
> j=sample(1:nrow(M),size=1,prob=d)
> plotsteps(liststates[j,])

Here are some examples. And indeed, we end up either with all cells in blue, or all cells in red.

Now, do we have to compute that transition matrix to produce those graph (and to generate that Markov chain) ? No. Of course not… At each step, I use a Dirac measure, and use the transition matrix just to get the probability to generate then the next state. Actually, one can write a faster and more intuitive code to generate the same chain… But I should probably keep that for another post…

Arthur Charpentier

Arthur Charpentier, professor in Montréal, in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1. Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.