A Simple Intro to Bayesian Change Point Analysis

[This article was first published on Quality and Innovation » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The purpose of this post is to demonstrate change point analysis by stepping through an example of change point analysis in R presented in Rizzo’s excellent, comprehensive, and very mathy book, Statistical Computing with R, and then showing alternative ways to process this data using the changepoint and bcp packages. Much of the commentary is simplified, and that’s on purpose: I want to make this introduction accessible if you’re just learning the method. (Most of the code is straight from Rizzo who provides a much more in-depth treatment of the technique. I’ve added comments in the code to make it easier for me to follow, and that’s about it.)

The idea itself is simple: you have a sample of observations from a Poisson (counting) process (where events occur randomly over a period of time). You probably have a chart that shows time on the horizontal axis, and how many events occurred on the vertical axis. You suspect that the rate at which events occur has changed somewhere over that range of time… either the event is increasing in frequency, or it’s slowing down — but you want to know with a little more certainty. (Alternatively, you could check to see if the variance has changed, which would be useful for process improvement work in Six Sigma projects.)

You want to estimate the rate at which events occur BEFORE the shift (mu), the rate at which events occur AFTER the shift (lambda), and the time when the shift happens (k). To do it, you can apply a Markov Chain Monte Carlo (MCMC) sampling approach to estimate the population parameters at each possible k, from the beginning of your data set to the end of it. The values you get at each time step will be dependent only on the values you computed at the previous timestep (that’s where the Markov Chain part of this problem comes in). There are lots of different ways to hop around the parameter space, and each hopping strategy has a fancy name (e.g. Metropolis-Hastings, Gibbs, “reversible jump”).

In one example, Rizzo (p. 271-277) uses a Markov Chain Monte Carlo (MCMC) method that applies a Gibbs sampler to do the hopping – with the goal of figuring out the change point in number of coal mine disasters from 1851 to 1962. (Looking at a plot of the frequency over time, it appears that the rate of coal mining disasters decreased… but did it really? And if so, when? That’s the point of her example.) She gets the coal mining data from the boot package. Here’s how to get it, and what it looks like:

library(boot)
data(coal)
y <- tabulate(floor(coal[[1]]))
y <- y[1851:length(y)]
barplot(y,xlab="years", ylab="frequency of disasters")

coalmine-freq

First, we initialize all of the data structures we’ll need to use:

# initialization
n <- length(y) # number of data elements to process
m <- 1000 # target length of the chain
L <- numeric(n) # likelihood fxn has one slot per year
k[1] <- sample(1:n,1) # pick 1 random year to start at
mu[1] <- 1
lambda[1] <- 1
b1 <- 1
b2 <- 1
# now set up blank 1000 element arrays for mu, lambda, and k
mu <- lambda <- k <- numeric(m)

Here are the models for prior (hypothesized) distributions that she uses, based on the Gibbs sampler approach:

  • mu comes from a Gamma distribution with shape parameter of (0.5 + the sum of all your frequencies UP TO the point in time, k, you’re currently at) and a rate of (k + b1)
  • lambda comes from a Gamma distribution with shape parameter of (0.5 + the sum of all your frequencies AFTER the point in time, k, you’re currently at) and a rate of (n – k + b1) where n is the number of the year you’re currently processing
  • b1 comes from a Gamma distribution with a shape parameter of 0.5 and a rate of (mu + 1)
  • b2 comes from a Gamma distribution with a shape parameter of 0.5 and a rate of (lambda + 1)
  • a likelihood function L is also provided, and is a function of k, mu, lambda, and the sum of all the frequencies up until that point in time, k

At each iteration, you pick a value of k to represent a point in time where a change might have occurred. You slice your data into two chunks: the chunk that happened BEFORE this point in time, and the chunk that happened AFTER this point in time. Using your data, you apply a Poisson Process with a (Hypothesized) Gamma Distributed Rate as your model. This is a pretty common model for this particular type of problem. Here is Rizzo’s (commented) code:

# start at 2, so you can use initialization values as seeds
# and go through this process once for each of your m iterations
for (i in 2:m) {
 kt <- k[i-1] # start w/random year from initialization
 # set your shape parameter to pick mu from, based on the characteristics
 # of the early ("before") chunk of your data
 r <- .5 + sum(y[1:kt]) 
 # now use it to pick mu
 mu[i] <- rgamma(1,shape=r,rate=kt+b1) 
 # if you're at the end of the time periods, set your shape parameter
 # to 0.5 + the sum of all the frequencies, otherwise, just set the shape
 # parameter that you will use to pick lambda based on the later ("after")
 # chunk of your data
 if (kt+1 > n) r <- 0.5 + sum(y) else r <- 0.5 + sum(y[(kt+1):n])
 lambda[i] <- rgamma(1,shape=r,rate=n-kt+b2)
 # now use the mu and lambda values that you got to set b1 and b2 for next iteration
 b1 <- rgamma(1,shape=.5,rate=mu[i]+1)
 b2 <- rgamma(1,shape=.5,rate=lambda[i]+1)
 # for each year, find value of LIKELIHOOD function which you will 
 # then use to determine what year to hop to next
 for (j in 1:n) {
 L[j] <- exp((lambda[i]-mu[i])*j) * (mu[i]/lambda[i])^sum(y[1:j])
 }
 L <- L/sum(L)
 # determine which year to hop to next
 k[i] <- sample(1:n,prob=L,size=1)
}

Knowing the distributions of mu, lambda, and k from hopping around our data will help us estimate values for the true population parameters. At the end of the simulation, we have an array of 1000 values of k, an array of 1000 values of mu, and an array of 1000 values of lambda — we use these to estimate the real values of the population parameters. Typically, algorithms that do this automatically throw out a whole bunch of them in the beginning (the “burn-in” period) — Rizzo tosses out 200 observations — even though some statisticians (e.g. Geyer) say that the burn-in period is unnecessary:

> b <- 201 # treats time until the 200th iteration as "burn-in"
> mean(k[b:m])
[1] 39.765
> mean(lambda[b:m])
[1] 0.9326437
> mean(mu[b:m])
[1] 3.146413

The change point happened between the 39th and 40th observations, the arrival rate before the change point was 3.14 arrivals per unit time, and the rate after the change point was 0.93 arrivals per unit time. (Cool!)
After I went through this example, I discovered the changepoint package, which let me run through a similar process in just a few lines of code. Fortunately, the results were very similar! I chose the “AMOC” method which stands for “at most one change”. Other methods are available which can help identify more than one change point (PELT, BinSeg, and SegNeigh – although I got an error message every time I attempted that last method).

> results <- cpt.mean(y,method="AMOC")
> cpts(results)
cpt 
 36 
> param.est(results)
$mean
[1] 3.2500000 0.9736842
> plot(results,cpt.col="blue",xlab="Index",cpt.width=4)

coalmine-changepoint

I decided to explore a little further and found even MORE change point analysis packages! So I tried this example using bcp (which I presume stands for “Bayesian Change Point”) and voila… the output looks very similar to each of the previous two methods!!!):

coalmine-bcp

It’s at this point that the HARD part of the data science project would begin… WHY? Why does it look like the rate of coal mining accidents decreased suddenly? Was there a change in policy or regulatory requirements in Australia, where this data was collected? Was there some sort of mass exodus away from working in the mines, and so there’s a covariate in the number of opportunities for a mining disaster to occur? Don’t know… the original paper from 1979 doesn’t reveal the true story behind the data.

There are also additional resources on R Bloggers that discuss change point analysis:

(Note: If I’ve missed anything, or haven’t explained anything right, please provide corrections and further insights in the comments! Thank you.


To leave a comment for the author, please follow the link and comment on their blog: Quality and Innovation » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)