I’m currently working my way through Rasmussen and Williams’s book on Gaussian processes. It’s another one of those topics that seems to crop up a lot these days, particularly around control strategies for energy systems, and thought I should be able to at least perform basic analyses with this method.
While the book is sensibly laid-out and pretty comprehensive in its choice of topics, it is also a very hard read. My linear algebra may be rusty but I’ve heard some mathematicians describe the conventions used in the book as “an affront to notation”. So just be aware that if you try to work through the book, you will need to be patient. It’s not a cookbook that clearly spells out how to do everything step-by-step.
That said, I have now worked through the basics of Gaussian process regression as described in Chapter 2 and I want to share my code with you here. As always, I’m doing this in R and if you search CRAN, you will find a specific package for Gaussian process regression: gptk. The implementation shown below is much slower than the gptk functions, but by doing things manually I hope you will find it easier to understand what’s actually going on. The full code is given below and is available Github. (PS anyone know how to embed only a few lines from a gist?)
Step 1: Generating functions
With a standard univariate statistical distribution, we draw single values. By contrast, a Gaussian process can be thought of as a distribution of functions. So the first thing we need to do is set up some code that enables us to generate these functions. The code at the bottom shows how to do this and hopefully it is pretty self-explanatory. The result is basically the same as Figure 2.2(a) in Rasmussen and Williams, although with a different random seed and plotting settings.
Step 2: Fitting the process to noise-free data
Now let’s assume that we have a number of fixed data points. In other words, our Gaussian process is again generating lots of different functions but we know that each draw must pass through some given points. For now, we will assume that these points are perfectly known. In the code, I’ve tried to use variable names that match the notation in the book. In the resulting plot, which corresponds to Figure 2.2(b) in Rasmussen and Williams, we can see the explicit samples from the process, along with the mean function in red, and the constraining data points.
Step 3: Fitting the process to noisy data
The next extension is to assume that the constraining data points are not perfectly known. Instead we assume that they have some amount of normally-distributed noise associated with them. This case is discussed on page 16 of the book, although an explicit plot isn’t shown. The code and resulting plot is shown below; again, we include the individual sampled functions, the mean function, and the data points (this time with error bars to signify 95% confidence intervals).
Hopefully that will give you a starting point for implementating Gaussian process regression in R. There are several further steps that could be taken now including:
- Changing the squared exponential covariance function to include the signal and noise variance parameters, in addition to the length scale shown here.
- Speed up the code by using the Cholesky decomposition, as described in Algorithm 2.1 on page 19.
- Try to implement the same regression using the gptk package. Sadly the documentation is also quite sparse here, but if you look in the source files at the various demo* files, you should be able to figure out what’s going on.